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To • " 

Lawrence B, Chany 
1921-1982 

Those who knew Larry Charry were saddened by his death in June 1982. He 
touched the lives of many people— a third grader who did one of his crossword 
puzzles, a student in one of his classes, a group of teachers whom he addressed. 

A person of unlimited energy, Larry was an organizer, a planner, a leader. He 
was a teacher for thirty-three years. He taught reading/study skills in high school 
long before others recognized that need; he taught at West Chester University and 
Trenton State College. Lany investigated new ideas. He was interested in the use of 
computers in education, developed crossword puzzles to help students improve 
their reading comprehension, and he wrote and edited reading material for chil- 
dren. He felt that readability was an area neglected by many classroom teachers, 
and he founded the ira Readability Special Interest Group to encourage the sharing 
of ideas and the dissemination of information. 

Much more could be written about Larry and his accomplishments, but we could 
never describe adequately his ability to motivate people, his friendliness, his sin- 
cerity, or his enthusiasm for everything he approached. Larry is missed, but 
remembered. 

John E. Boyd 
St. Peters, Pennsylvania 
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Foreword 



This book is organized around the theme of the past, present, 
and future of readability research. I shall draw on the articles 
in this volume to highlight the changes that occurred from the past 
to the present and will make some predictions about future readaoil- 
ity research. 

Past research in readability was atheoretical. Chall points out 
that the pioneers in the 1920s and 1930s tried numerous variables 
before discovering that sentence length and word difficulty were the 
best predictors of readability. They are not causes but indices of the 
semantic and syntactic difficulty of texts. Word frequency counts 
and readability criteria influenced basal readers from the 1920s to 
the 1960s when they used a whole word method of teaching begin- 
ning reading. In a review of research, Rabin demonstrates that read- 
ability formulas with language-required modifications have had a 
great impact throughout the world. Their impact is likely to con- 
tinue until more appropriate and useful criteria displace the current, 
easy to use formulas. 

An often cited misuse of readability formulas is the applica- 
tion of word and sentence length as writing criteria to make texts fit 
particular grade levels. Fry responds to this criticism by formulat- 
ing readability inspired writing criteria to make texts comprehensi- 
ble. His article might start a writeability research thrust. 

Past readability criteria focused on the use of text characteris- 
tics to predict the grade at which an average reader would compre- 
hend the text. It was assumed that the reader had the necessary 
resources for comprehension. The cloze technique does not predict 
and assume; it provides an actual try out on the material. 



Researchers have questioned and experimented with the cloze 
technique word deletion rule for assessing comprehension. Binkley 
explains that some of the experimentation is to determine the text 
factors that influence learning and memory. Using cohesion analy- 
sis, she examines a text to identify the writer's cohesive style and 
then deletes cohesive ties to reflect this style. Analysis of the re- 
sponses provides diagnostic information about the student— infor- 
mation on the student's intersentential integration ability and where 
the student should be placed on a reading development continuum. 

In a novel approach to searching for alternatives to traditional 
readability criteria, Davison interviewed librarians and publishers 
of children's materials. She concludes that a set of principles can be 
developed for grading reading difficulty without using formulas for 
writing and editing. 

Klare analytically reviews readability formulas from the past 
to the present. He concludes by pointing to Zakaluk and Samuels' 
method as the newest approach to readability. 

Using an explicit theoretical formulation, Zakaluk and 
Samuels argue that a text's readability is a function of an interaction 
between text characteristics and reader resources. They then mea- 
sure text difficulty and reader resources and insert this information 
into a nomograph to determine the readability of the text for individ- 
ual readers. Their method is time consuming as they have to mea- 
sure both text characteristics and reader resources (word recognition 
skills and prior knowledge for the text's topic). However, it is the 
beginning of research on a text-reader model of readability. 

This book demonstrates that readability research has pro- 
gressed from an atheoretical to a theoretical basis. In the past, re- 
searchers focused on text characteristics to predict readability. 
Currently, they are assessing text characteristics in interaction with 
reader resources. Researchers also are using readability procedures 
to test specific hypotheses derived from cognitive theory, thus at- 
tempting to find causal factors of readability. 

Although predicting the future is hazardous, I think the inter- 
active model will be a focal point of future research on readability. 
Researchers will continue to investigate hypotheses on the relation- 
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ship between specific text fectors and students' cognitive processes. 
The effects the teacher has on modifying text, enhancing reader re- 
sources, and establishing comprehension goals will enter into re- 
search on readability in a classroom setting. 

This book should be appealing to students who want a short 
but up to date overview, researchers who are interested in a critical 
appraisal, and consumers who would like to know what leaders in 
the field think about the past, present, and future of readability re- 
stvxch. 

Harry Singer 
University of California at Riverside 
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Preface 



In 1982, while preparing for the 1983 ira Convention, Lawrence 
Charry, founder and program chairperson of the Readability 
Special Interest Group, decided it was time to take a long look at 
readability -"past, present, and future" He stated, "Readability has 
been around fifty years, more or less. Has it worked? What are the 
benefits? What are the weaknesses?" He questioned also whether 
readability formulas were being used as they had been intended. He 
projected a careful look at the state of the art by those who could 
offer the greatest insight. 

Although Dr. Charry did not live to see his plans come to 
fruition, a seminar, "Readability: An Historical Approach," was pre- 
sented at the IRA Annual Convention in Anaheim in May 1983. 
Speakers at the seminar included Jeanne S. Chall, George R. Klare, 
Edward B. Fry, and S. Jay Samuels. 

Recognizing the historical significance of the event, Snmuels 
and Zakaluk volunteered to edit a monograph on readability in 
which the Anaheim papers would be included. They developed the 
framework for what follows, combining updated versions of the 
original papers with other manuscripts on related topics. Included in 
the latter group is information on readability research on text writ- 
ten in languages other than English, wrlteability, and nonconven- 
tional approaches to estimating text difficulty. 

Since work was begun on this monograph, ira and ncte have 
issued a joint statement regarding the possible harmful effects of the 
uncritical use of leadability formulas. This work was not intended as 
either an apology for or a defense of the use of readability instru- 
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ments. It is a description of the status quo by those who are best 
qualified to give it. However, -readability instruments would not 
have been developed had there not been a need for them, and it 
would be a great loss if their detractors were responsible for the ces- 
sation of current worldwide research. If we disregard all we have 
learned until now, we will be in the position of those in Venezuela as 
described by Nelson Rodriguez-Trujillo in the June/July 1985 issue 
of the ira's Reading Today. 

In Spanish, wc arc at the other end of the spcrlrum. In this 
language, wc confront the situation of having no readability 
formula good enough to explain even some '>f the variability 
of language difficulty and arc suffering that absence. In 
evaluation committees, people argue and countcrarguc try- 
ing to d'^cidc whether malcr;<ils arc of a r-rtain difficulty or 
appropriate lor certain children. At the end, the issue is re- 
solved on the basis of personal opinion, having in mind 
some abstract child. It is also a common situation to see 
teachers in the classroom selecting texts that are xoo difficult 
for children and forcing them to suffer frustration. The lack 
of information on readability of materials prevents teachers 
from responding to the students' different levels of ability. 

The goal of this volume has been to review the field of read- 
ability and to suggest possible new directions. We hope we have 
been successful and that we have been able to clarify some of the 
issues, questions, and concerns readers might have regarding the 
use of formulas to evaluate written materials. 

Annette T. Rabin 
Lincoln University 
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The Beginning Years 



Jeanne S. Chall 



The study of readability, in the sense of language comprehensi- 
bility, has a long history. It has deep roots in the classical rhet- 
oric of Plato and Aristotle and in the vocabulary analyses of the 
Bible by ancient Hebrew scholars (Lorge, 1944). Although these 
ancient sources continue to enrich our understanding of language 
and text effectiveness, this chapter covers a shorter and more recent 
history of readability that began in the 1920s. It focuses on the con- 
tributions of educational researchers and on the use of readability in 
education. There is a considerable body of research and application 
of readability to general communication that, because of space limi- 
tations, cannot be covered here. 

The Beginning Years of Readability Measurement 

The beginnings of readability research came from two main 
sources— studies of vocabulary control and studies of readability 
measurement. Vocabulary control studies were concerned with the 
vocabularies that would be most effective for learning to read from 
reading textbooks. Specifically, they studied "new words" in each 
book, the number of times they wer repeated, and their difficulty. 

Readability measurement came from an interest in the com- 
prehension difficulty of content area textbooks. During the begin- 
ning years, readability researchers devised procedures and 
instruments that would reliably and validly distinguish easier from 
more difficult texts or grade texts in order of difficulty. 
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Thus, the vocabulary control studies and the readability stud- 
ies had similar purposes. Both sought objective means of measuring 
the difficulty of printed materials for learning and for comprehen- 
sion. The vocabulary control studies concentrated mainly on pri- 
mary level textbooks, while the readability studies were more 
interested in the comprehensibility of content texts and other materi- 
als wrirten for students of middle and upper elementary grades, high 
school, college, and adults. Both areas of research started in the 
1920s, although vocabulary control studies were more prevalent 
during the earlier years. 

Both sets of investigations were concerned that the existing 
primary basal readers and content texts were too difficult for most 
students for whom they were intended. An early vocabulary study of 
fourth grade reading textbooks found a variation of from 20 to 40 
percent in "unknown" words (Dolch, 1928). The impetus for the 
•first readability study (Lively & Pressey, 1923) came from teachers 
who reported an unusual number of technical terms in junior high 
school science books, so that the study of the subject necessitated 
acquiring a scientific vocabulary rather than the learning ol scien- 
tific facts and generalizations. 

Thus, the first study of readability, similar to the early vocab- 
ulary control studies, was concerned not only with objective proce- 
dures for estimating difficulty but with making texts more readable 
for the students who used them. Why did this occur in the 1920s? 

One factor was the publication in 1921 of the first extensive 
frequency word count of the English language, Thorndike's Teach- 
er's Word Book, which provided an objective measure of word diffi- 
culty. 

Another hypothesis is that the junior and senior high school 
population was changing in the 1920s. The population began to in- 
clude more students who previously would have completed their 
formal schooling in the elementary grades. These "new students'* 
were the first generation in their families to attend secondary 
school. Textbooks written for the earlier secondary school popula- 
tion, with stronger academic backgrounds, may have been too diffi- 
cult for many of the newer secondary students. 
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It is harder to hypothesize why easier reading textbooks were 
sought for the primary grades, since compulsory schooling for ele- 
mentary age children had been in effect for many years. One possi- 
bility is that more children were entering the elementary grades 
without the knowledge of English needed for reading existing text- 
books. Interestingly, this is not mentioned in the early research liter- 
ature on vocabulary control. A more likely hypothesis is the change 
in the 1920s from a heavier phonic to a heavier sight word approach 
for teaching beginning reading. The greater emphasis on sight rec- 
ognition may have resulted in a need for lower vocabulary loads in 
reading textbooks, particularly in the primary grades. That this hy- 
pothesis has some validity is seen in the historical changes in the 
number of words in basal readers. Vocabulary counts decreased 
substantially from the 1920s to the 1960s, the years when sight word 
approaches were predominant. From the late 1960s to the early 
1980s, the vocabulary loads of primary level basal readers increased 
considerably, as the amount of systematic instruction in phonics in- 
creased (Chall, 1983). 

Research versus Mission 

It is significant that the beginnings of both vocabulary control 
and readability measurement had their roots in changing social con- 
ditions. Researchers felt that easier textbooks would make students' 
learning more effective. Since the prevailing educational philosophy 
was to provide an education for all, researchers sought ways to as- 
sess whether this criterion was met. Thus, research started with the 
desire to find objective means to determine whether textbooks were 
suitable for those using them. Also, from the start, the work had a 
strong mission behind it -to use objective measures to select and 
produce textbooks suitable for all children. 

This strong mission led to some unexpected oatcomes. The 
early consensus that the books were too hard for most children, sup- 
ported by the early data from various vocabulary and readability 
measures, led to recommendations that the books be made easier. 
But soon, recommendations for easier textbooks could not be sup- 
ported by the research evidence. 
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In my 1958 review of the research on vocabulary control, I 
could find only one experimental study (Gates, 1930) designed to 
determine optimal difficulty of vocabulary for first grade reading. 
Gates tried experimental materials of varying vocabulary loads with 
children of different abilities. From these studies he estimated the 
number of repetitious necessary for best results with first grade chil- 
dren of different intellectual abilities. He found that 35 repetitions 
were best for those of average ability, 20 for those above average, 
and 40 to 45 for those below average. 

When the vocabularies of basal readers are compared with 
the criteria established by Gates, we find that his standards were met 
in the late 1930s. By then, most first grade basals already had vo- 
cabulary loads recommended by the Gates experiment, but the vo- 
cabulary loads of primary basal readers continued to decline until 
the middle of the 1960s. 

When there was so little research evidence on optimal vocab- 
ulary control, why did basal reader vocabularies continue to de- 
cline? There is probably no one answer, but several may have some 
validity. 

There was confidence in the conclusions of the early vocabu- 
lary control studies that the basal readers were indeed too hard. (See 
Dolch, 1928.) There was also confidence that comparing the vocab- 
ularies of reading series from different publishers would lead to bet- 
ter standards of optimal difficulty. That is, an average from the 
various puolishers would be closer to the optimal than the extremes. 
If most publishers followed this, it was inevitable that books would 
be easier with each new publication date. 

Subject matter textbooks written for the higher grades, which 
were evaluated by readability measures during the beginning years, 
also were found to be too difficult for most students. And similar to 
vocabulary control researchers, readability researchers recom- 
mended easier textbooks. From the 1940s to the mid 1970s, there 
was a general decline in the difficulty of elementary and high school 
textbooks as judged by a variety of measures: readability formulas, 
level of maturity, difficulty of questions, and ratio of pictures to text 
(Chall, Conard, & Harris, 1977). After a long period of growing 
ease, reading instruction textbooks started to become more difficult 
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in the early 1960s. This is explained best by the introduction of 
stronger phonics programs in most basal reading series in the late 
1960s and the 1970s. 

From the beginning years until the present, there has been a 
tendency to confuse the scientific study of vocabulary control and 
the measurement of readability with their educational uses. The sci- 
entific side of vocabulary control and readability measurement pro- 
duced tools, procedures, and undcr<;tandings that helped make 
possible optimal matches between readers and texts. Under certain 
conditions, this might suggest raising the level of difficulty; under 
other conditions, lowering the level. The objective tools made it 
possible to find that a book was too easy or too hard. But early re- 
search literature seldom reported that textbooks were too easy. This 
started to occur in more recent years. 

Text Factors Studies in the Early Years 

Vocabulary control, as well as early readability studies (1922 
to 1926) tended to concentrate on vocabulary aspects such as diffi- 
culty, diversity, and range. The Thorndike frequency word lists or 
other word lists based on frequency of use in textbooks, readers, or 
by students in given grades were used to measure vocabulary diffi- 
culty. Judgment, experience, and correlational analysis were the 
standards for accepting one criterion of vocabulary difficulty as 
more reliable and valid than another. 

During the early years of readability measurement, most re- 
searchers concentrated on vocabulary; in a second period of read- 
ability studies, investigation concentrated on a greater variety of 
factors (1928 to 1939). As early as 1926, the Winnetka Formula, 
designed to predict comprehension difficulty and interest in chil- 
dren's books, used several vocabulary and sentence factors 
(Washburne & Vogel, /926). The end of this second period brought 
the Gray-Leary study (1935) that related eighty-two factors of text 
difficulty (including vocabulary, syntax, interest, and organization) 
to passages graded on the basis of reading comprehension perform- 
ance by adults. 
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standards of Optimal Difficulty 

Generally, early studies of the vocabulary of reading text- 
books found that basal reader series intended for the same grade 
varied widely in vocabulary difficulty and diversity and that most of 
the books used vocabularies outside the experience of children in the 
intended grades (Dolch, 1928). In most of these studies the conclu- 
sions were based on comparing the vocabularies of given basals with 
specific word lists, with the words in other basals, or with basals 
published at earlier dates. There were about seventy such studies 
during the late 1920s and early 1930s, and almost all referred to 
some word list as a basis for estimating whether a word was known 
to children in the grade, considering words outside the list unknown 
and hence difficult. Practically none of the studies actually tested 
the materials judged easier or harder. 

Reliance on these standards continued into the 1940s. In 
1941, Spache conducted an extensive analysis based on published 
vocabulary control studies, suggesting standards for the selection of 
primary level basal readers based on average vocabularies and 
ranges in published texts. On the assumption that most readers were 
still too hard, he indicated that the easier books for a given grade 
were "superior''; the harder books, "inferior." 

What were the effects of these vocabulary studies? There is 
little doubt that the studies and their uses influenced authors of text- 
books and publishers of educational materials. The studies also 
probably influenced state adoption committees and schools in the 
selection of textbooks and other instructional materials. 

As noted earlier, the trend toward preferring reading text- 
books with lower vocabularies c-^^me from the wide use of sight ap- 
proach readers from the 1920s to the late 1960s (Chall, 1967, 
1983). With less direct teaching of decoding and word recognition, 
the stories had to have fewer different words. Most early investiga- 
tions recommended that the vocabularies of primary reading text- 
books be low. There were some disagreements, however. Stone 
(1942) criticized a new reading series that presented only 1,147 dif- 
ferent words through the third grade readers. And Yoakam (1945, p. 
309) hoped that the use of readability formulas would "correct the 



^ 20 

1^ R^C ^^Sinning Years 



tendency to make them too easy^ The majority view seemed to win. 
The numbers and kinds of words in the basal readers continued to be 
an important issue in book production and selection. When books 
were ranked on various vocabulary factors, the results published, 
?nd superiority related to ease, publishers tried to meet or exceed 
the "averages.'' That this happened during the beginning years can 
be seen from a study by Hockett (1938), who found that first grade 
readers dropped from 644 different words in 1926 to 462 in 1937. 
Second grade readers dropped from 1,147 different words in 1930 
to 913 in 1937. 

The great drop in the vocabularies of primary readers from 
1920 to 1960 (Chall, 1967, 1983) had its roots in several sources: 
the mission to teach all children, the early research results that con- 
cluded books were too hard, the changes in teaching methods, and 
the changes in student population. It also seems to have been influ- 
enced by research methods that unfortunately based their recom- 
mendations on comparisons and averages rather than on 
experimental tests of students and texts. Except for the Gates (1930) 
experimental study of optimal vocabulary size for first grade read- 
ing programs, there were no experimental studies to determine the 
best vocabulary standards for students of different abilities. The 
comparison of vocabularies to word lists in readers of the same 
grade but from different publishers was not enough. Empirical data 
were needed on the effects on children's learning as a result of using 
different vocabularies. With the consensus that easier is better, it 
was almost inevitable that readers became easier without strong evi- 
dence of their effectiveness. 

Effects of Readability Assessment on Text Difficulty 

What can explain the decline in content textbook difficulty 
during the eariy years and later? Several reasons seem valid. As 
noted eariiei, there was a growing concern that schools must educate 
all students, particularly the "new" senior high pupils whose parents 
had less academic background compared with previous high school 
populations. The mission for easier books thus had some basis in 
reality. The existence of valid, reliable, and easy to use tools for 
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estimating text difficu^^' levels gave further impetus to the use of 
readability measures by publishers, , state and local textbook adop- 
tion committees, and schools. The growing concern for individual- 
ized instruction to meet the needs of students of varying reading 
abilities and the desire to match students' abilities to the difficulties 
of textbooks made the technology of the measurement more useful 
to publishers and teachers. 

The growing ease of content textbooks, similar to the growing 
ease of reading textbooks, seems to stem from similar factors, "^ext- 
books of the 1920s were probably too hard for the newer students, 
and there was a strong desire to make education more effective for 
them. What kept this mission so active and why did the textbooks 
become progressively easier, beyond that indicated by the research 
evidence and perhaps beyond their effective'^eso for the students us- 
ing them (Chall, Conard, & Harris, 1977)? 

Teachers, publishers, text adoption committees, researchers, 
and the instruments themselves have been blamed for the growing 
ease of textbooks. The most recent tendency to blame word lists and 
readability formulas for the poor quality of textbooks is unfortunate; 
it is similar to blaming poor reading ability on the use of standard- 
ized reading tests. The causes are probably more complex and inter- 
related, but when understood will prove more helpful. 

Optimal Difficulty 

Perhaps the weakest aspect of readability and vocabulary re- 
search and their uses is the paucity of experimental studies to estab- 
lish standards that are optimal for learning, comprehension, 
interest, and efficient reading. 

Thd vocabulary and readability standards of the early years 
probably were headed in the right direction. The books probably 
were too hard for most of the students for whom they were intended. 
But continuing in that direction for forty or fifty years without ex- 
perimental verification may have become dysfunctional. The curric- 
ulum changes, students change, teachers change, methods change, 
and expectations of achievement change. Thus, standards need con- 
tinuous reevaluation. 
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The More Things Change 

This section attempts to compare some of the concepts of 
readability and vocabulary measurement proposed during the begin- 
ning years with those of the i980s. 

As noted earlier, beginning readability research tended to fo- 
cus on vocabulary and syntax, although investigations soon began to 
study other factors (Gray & Leary, 1935). In the 1980s, the read- 
ability concept of many tended to concentrate on text structure- 
organization, coherence, and cohesion. 

Many researchers of the 1980s have been critical of the lim- 
ited factors used in readability formulas. Newer critics give the im- 
pression that readability researchers in the early years overlooked 
factors other than words and sentences because they did not know 
they existed. . 

Excerpts from current analyses of readability and from re- 
searchers in the early years are presented here to give some insight 
into the historical development of the concept of readability. 

Are readability measures concerned only with surface mea- 
sures? An excerpt from Huggins and Adams (1980, p. 91) claims 
readability measures are concerned only with surface structures. 

Although readability measures can be found that correlate 
fairly well with text difficulty... their main weakness is that 
the difficulty of a passage involves its comprehension, and 
surface structure descriptions capture only some of the syn- 
tactic variables necessary to comprehension. As an extreme 
example of the inadequacy of these [readability] formulas, 
most of them would yield the same readability index on a 
passage if the word order within each phrase, and the order 
of the phrases within each sentence, were scrambled. 

Ojemann (1934) indicates that in addition to vocabulary diffi- 
culty, composition, and sentence structure, such qualitative factors 
as concreteness or abstractness of relationships (as distinguished 
from individual words used), obscurity, and incoherence in expres- 
sion were used in estimating text difficulty. The following excerpt 
from Ojemann reminds us of the excerpt from Huggins and Adams. 
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In similar studies that have been carried out for the most 
part with school children, qualitative factors have been 
overlooked in general. Their importance may be made 
clearer by considering an extreme example. If in a set of 
paragraphs the sentences were arranged in random order, 
the number of sentences, the voc^ju!r^ry difficulty, etc., 
would remain constant, but there is considerable possibility 
that comprehension would be interfered with (p. 19). 

As is often assumed by researchers in the 1980s, Ojemann 
did not treat the hard words in a mechanical way. He noted that diffi- 
cult passages contained difficult words because they discussed ab- 
stract ideas, and the easy passages used common words because 
they dealt with concrete experiences. 

These excerpts from 1934 and 1980 make remarkably similar 
points. They say that we cannot look at readability factors in a sur- 
face or mechanical manner. Further, each reports that mixing sen- 
tences and words could give the same readability rating, but it would 
not be its true measure of comprehension difficulty. 

The similarity of these observations raises an important issue 
with regard to earlier and later research in readability. Current re- 
searchers tend to view limitations as stemming from lack of knowl- 
edge. Yet much knowledge of language and communication with 
regard to text difficulty existed fifty or sixty years ago, possibly ear- 
lier. If the instruments and ideas were abused, explanations other 
than ignorance need to be sought. 

Using Readability Measures for Writing and Rewriting 

One current criticism of readability formulas is that they have 
led to poor quality writing because some editors and publishers have 
turned the readability formulas into means of obtaining lower read- 
ability scores. The claim is that writers use readability measures 
mechanically, substituting easier for harder words and shorter for 
longer sentences in order to achieve lower (assumed to be better) 
readability scores, fhc recent position paper on readability of the 
IRA and NCTE makes a point about this issue (Cullinan & Fitzgerald, 
1984). 
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It is interesting to note that the current concern about the neg- 
ative effects of mechanically simplifying texts also is expressed by 
readability researchers and scholars in the beginning years. In the 
1930s Horn cautioned against the mechanical use of word lists and 
readability formulas for selecting and rewriting books in the social 
studies. He said that word lists and readability formulas do not ade- 
quately consider the conceptual difficulty of texts that may contrib- 
ute to poor understanding, although the words may be common. 
Horn gave examples showing that words of high frequency may 
even cause greater difficulty since pupils may give words the wrong 
meanings. He further demonstrated from the studies of his students 
that negligible effects on comprehension may result merely from the 
simplification of vocabularj; 

There, is a real danger that the mechanical and uncritical use 
of data on vocabulary will not only affect adversely the pro- 
duction, selection, and use of books but will result in ab- 
surdities that will threw research in this field into disrepute 
(Horn, 1937, p. 162). 

The dangers that may stem from the use of readability formu- 
las for mechanical rewriting of texts have always been of great con- 
cern, from the beginning years to the present. And the cautions 
from readability researchers to editors, publishers, and to schools 
that they should not use the formulas rigidly and mechanically also 
have been steady throughout the years. However, it was found that 
benefits for comprehension could be achieved when readability 
principles are used creatively (Chall, 1958). 

To Conclude 

Readability and vocabulary studies have a long history of re- 
search and application. At times we wonder why we seem to over- 
look the hard gained knowledge of the past in our eagerness to 
discover the new and why we seem to overlook the continuity of the 
old in the new. The present and fliture years in readability research 
and application should bring the knowledge of the early years to 
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prominence again, while newer and better instruments and stand- 
ards are developed. At the same time, it is hoped that the research 
on how best to use the instruments is kept current and in tune with 
the changing achievement of students, with standards, and v/ith the 
knowledge and art of teachers and ed*' lonal publishers. 
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Writers sometimes call text readable for rather different rea- 
sons. They may consider it legible or interesting or compre- 
hensible. Over the years, comprehensible has become the most 
common reason. In fact, Harris and Hodges (1981) apply the terms 
readability and comprehensibility almost interchangeably in A Dic- 
tionary of Reading and Related Terms. This increased usage, at least 
in the field of education, stems from the widespread application of 
readability formulas. Many teachers think of readability primarily 
as a formula score. 

A review of the formative years (up to the present) in the de- 
velopment of readability measures can add further background to 
Zakaluk and Samuel's presentation (Chapter 7) on the role of read- 
ability in matching materials to readers. By providing a review, this 
chapter serves as an aid to understanding present and future read- 
ability research and application and as an introduction to the ensu- 
ing chapters. The points listed, most typified by a readability 
formula, illustrate the developing concept of readability. 

1. The almost exclusive emphasis on style variables in read- 
ability formulas. 

2. The reduction of style variables to semantic and syntactic 
factors. 

3. The search for a satisfactory criterion for formula devel- 
opment. 

4. The presentation of readability formula scores in terms of 
grade levels. 
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5. The efficient use of a word list for the semantic factor and 
sentence length for the syntactic factor. 

6. The efficient use of syllable len' th for the semantic factor 
and sentence length for the syntactic factor. 

7. The trend to increased emphasis on ease of use. 

8. The development of formulas for languages other than 
English. 

9. The introduction of cloze procedure as a convenient crite- 
rion for foHiiUla development. 

10. The growing criticism of readability formulas in terms of 
their developmental criteria and their grade level scores. 

11. The growing criticism of readability formulas in terms of 
"writing to formula.'' 

12. The need for improvement of current readability mea- 
sures. 

The history of readability is exhaustive. Chall (1958), Klare 
(1963, 1974), and Harrison (1980) provide added details for the ear- 
lier points. More recent points regarding research can be found in 
Klare 1982, 1984. 

Almost Exclusive Emphasis on Style Variables 

Gray and Leary did not develop the first readability formula; 
that distinction belongs to Lively and Pressey (1923). But their work 
and their influential formula (Gray & Leary, 1935) illustrate the 
first point exceptionally well. They began their research by collect- 
ing ideas about possible contributors to readability from 100 experts 
and 100 library patrons and put together a list of 289 so-called 
factors, which they grouped into four categories. 

• Content 

• Style of expression and presentation 

• Format 

• General factors of organization 

Gray and Leary then cut this list to the 44 factors they could 
count reliably and which occurred often enough in the passages of 
their Adult Reading Test (their criterion passages) for statistical 
analysis. Of these 44 factors, 20 were significantly related to the 
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scores of adults of limited reading ability. And of these 20 factors, 
they first used 8 in a multiple regression equation before finally set- 
tling on the 5 style fectors in the formula below. 

X, = -.01029x2 + .009012X5 - .02094x, - .03313x; - 
.01485X8 + 3.774 

X, = average comprehension score 

Xj = number of different hard words not on Dale List of 769 

X5 = number of personal pronouns 

x^ = average number of words per sentence 

X7 = percentage of different words 

Xg = number of prepositional phrases 

Gray and Leary's formula yielded a multiple R of .65 with 
their criterion. They had hoped to include variables other than the 
five style elements they ended with, but others could not meet their 
requirements of being counted reliably, occurring often enough in 
their passages, and contributing sufficiently to their regression 
equation. As it happened, their procedure of combining only style 
variables in a regression equation became the typical pattern for for- 
mula development. 

Reduction of Style to Semantic and Syntactic Factors 

Washbume and Morphett were among the earliest of the for- 
mula developers; their first formula (Vogel & Washburrie, 1928) 
came out shortly after Lively and Pressey's pioneer effort. Their 
second formula (Washbume & Morphett, 1938), however, shows 
the second point more clearly since the three variables they used 
reduce to semantic and syntactic fectors. 

X, = .00255x2 + .0458x3 - .0307X4 + 1.294 

X, = grade placement 

X2 = number of different words 

X3 = number of different uncommon words (outside Thorn- 
dike's 1,500) 

X4 = number of simple sentences in 75 sample sentences 
Lorge's (1939) formula, which appeared soon after Wash- 
bume and Morphett's, illustrates the same point. 
X, = .07x2 -r .1301x3 + .1073x4 + 1.6126 
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X, - grade placement 
Xj = average sentence length in words 
X3 = number of prepositional phrases per 100 words 
X4 = number of different hard words per 100 words not on the 
Dale 769 word list 
Entin and Klare (1978a) factor analyzed three extensive readability 
matrices and found that semantic and syntactic factors still ac- 
counted for most of the variance. That is the good news; the bad 
news is that other kinds of style variables (literally hundreds have 
been tried) contribute so little added variance. 

The formulas of Washbume and Morphett and of Lorge also 
illustrate the next point. 

Search for a Satisfactory Criterion 

Washbume and Morphett achieved a multiple R of .86 with 
their criterion, a remarkable value for its time. This might have 
been due at least partly to the nature of the criterion, which used the 
reading test scores of children wno reported reading and liking a 
large sample of books. Few researchers have developed a criterion 
involving something other than comprehension alone, or one as ex- 
tensive and labor intensive. 

Lorge appeared to have found a more convenient criterion in 
McCall and Crabbs' Standard Test Lessons in Reading (1925). The 
Lessons had characteristics he and subsequent developers found 
very useful: a large number of passages, a variety of topics, a wide 
range of difficulty, and detailed grade levels. Lorge found a multiple 
R of .77 for his formula against this criterion. Though this value 
was somewhat lower than Washbume and Morphett's, the criterion 
was readily available and more convenient for statistical analysis, 
and Standard Test Lessons in Reading became the standard in early 
research. Lorge later discovered an error in the calculations for his 
formula that led him to publish a revised formula (1948). The revi- 
sions were slight— scores from the two formulas correlated +.94 — 
but the new formula had a reduced multiple R of .67 with the 
McCall-Crabbs criterion passages. This might have been a cause for 
further research had it been discovered earlier, but the McCall- 
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Crabbs Lessons were too convenient for others to abandon (as will 
be noted later). 

The formulas of Lorge and of Washbume and Morphett pro- 
vided the standard illustrated in the next point. 

Presentation of Scores in Terms of Reading Grade Levels 

Readability formulas grew in popularity because they prom- 
ised teachers a way of matching reading materials to the abilities of 
readers. With the earliest formulas, the match could not be made 
conveniently. The Washbume-Morphett and Lorge formulas, how- 
ever, provided scores directly in terms of grade placement. This ar- 
rangement contributed to the popularity of formulas; in feet, certain 
formula makers added such scores to formulas that first appeared 
without them. 

Use of grade level scores turned out to be a mixed blessing, 
since this practice c.itributed to disagreement among formula esti- 
mates. Formulas may disagree for other reasons, such as the varia- 
bles used, the developmental criteria used, and the range of ability 
of the subjects. In addition, certain formula makers based their 
grade level criterion on the 50 percent comprehension level (e.g., 
where subjects could answer 50 percent of the questions on pas- 
sages) and others on the 75 percent level. Thus, disagreements were 
bound to increase. McLaughlin (1969) insisted on the 100 percent 
comprehension level (whatever that is) in developing his formula. 
As a consequence, several writers found that McLaughlin's formula 
consistently gave reading level estimates about two grades higher 
than the widely used formulas. 

The issue of grade level scores (particularly formula dis- 
agreements) continues, as will be emphasized later. Dale and Chall's 
formula (1948) illustrates both this point and the one to follow. 

Efficient Use of a Word List and Sentence Length 

Dale and Chall's formula was the most widely used formula 
in educational circles for many years. Their formula held up ex- 
tremely well -in fact, surpassed others -in validity, and set the 
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Table 1 

Correction Table for Use with the Dale-Chall Readability Formula 



Formula Score Corrected Grade Level 



4.9 and below 


4 and below 


5.0-5.9 


5-6 


6.0-6.9 


7-8 


7.0-7.9 


9-10 


8.0-8.9 


11 - 12 


9.0-9.9 


13 -15 (College) 


10.0 and above 


16 + (College graduate) 



Stage for other word list formulas. The McCall-Crabbs passages 
served as criteria for developing this formula, presented here. 

x,5o = .1579x, + .0496x2 + 3-6365 

x^5o = reading grade score of pupils who can answer correctly 
one-half the questions on a McCall-Crabbs passage 

X, = percentage of words outside the Dale list of 3,000 

Xj = average sentence length in words 

The formula used the 50 percent comprehension level as a 
criterion. However, a correction table also was provided for adjust- 
ing the formula scores to correspond more closely to difficulty, par- 
ticularly for harder materials. (See Table 1.) Dale and Chall report 
that witi: these corrections the comprehension level falls between 50 
and 75 percent. 

The table also serves another useful correction purpose. As 
Bormuth (1966) pointed out, language variables do not necessarily 
relate to comprehension difficulty in a linear fashion, yet formula 
makers use linear equations. This can introduce a certain degree of 
error in the formulas. The matter of curvilinearity can be seen 
clearly in the difficulty scale itself when grade levels are used. As 
reading material at tne higher grade levels (high school and college) 
is analyzed, the readability scores should begin to level off and go 
no higher than, perhaps, college graduate level. Throughout this 
part of the range, content knowledge increases in importance; read- 
ability formulas, being based on style variables alone, cannot ade- 
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quately account for this. Furthermore, grade levels have little if any 
meaning beyond college graduate level. 

Formulas that do not take account of curvilinearity can, at the 
extreme, provide absurd scores. An author submitted a passage 
from the California probate code for analysis by two research work- 
ers who had developed computer programs for one such formula. 
The programs were consistent-both reported that 122 years of 
schooling would be necessary to understand the passage! 

The next formula, Flesch's popular Reading Ease, again illus- 
trates the above point and the one to follow. 

Efficient Use of Syllable Length and Sentence Length 

Flesch had developed one readability formula prior to the 
publication of his later and much better known Reading Ease for- 
mula (1948). He first included a variable, called personal refer- 
ences, that he hoped would combine with his style difficulty 
variables, thus bringing interest value into the scores. Several re- 
search workers quickly pointed out that this served more to dilute 
than to strengthen the value of the scores. Consequently, Flesch pro- 
posed separate Human Interest and Reading Ease formulas, both us- 
ing the McCall-Crabbs Lessons and the 75 percent comprehension 
level. As in similar attempts to bring in variables other than style 
difficulty, the former never achieved wide usage. The latter (below), 
however, became the most widely used formula outside educational 
circles. 

RE = 206.835 - .846 wl - 1.015 si 

RE = Reading Ease, on a scale from 100 (very easy to read) 
to 0 (very difficult to read) 

wl = average word length, in syllables 

si = average sentence length, in words 

Flesch soon found that he needed a grade level scale to satisfy 
users. He went one better and provided two. The only difference in 
the two lay in the final columns of each. Both can be presented to- 
gether in Table 2 with the last two columns the only ones not com- 
mon to both. 
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Table 2 

Interpretation Tiible for Flesch Reading Ease Scores 



Description 
of Style 


Average 

Sentence 

Length 


Average 
Number of 
Syllables per 
100 Words 


Reading 

Ease 

Score 


Estimated 
School 
Grades 
Completed 


Estimated 

Reading 

Grade 


Very Easy 


8 

or less 


123 

or less 


90 to 100 Fourth 
Grade 


Fifth Grade 


Easy 


11 


131 


80 to 90 


Fifth Grade 


Sixth Grade 


Fairly Easy 


14 


139 


70 to 80 


Sixth Grade 


Seventh 
Grade 


Standard 


17 


147 


60 to 70 


Seventh or 

Eighth 

Grade 


Eighth and 

Ninth 

Grades 


Fairly Difficult 


21 


155 


50 to 60 


Some High 
School 


Tenth to 
Twelfth 
Grades 


Difficult 


25 


167 


30 to 50 


High School 
or Some 
College 


Thirteenth 
to Sixteenth 
Grades 
(College) 


Very Difficult 


29 192 
or more or more 


Oto 30 


College 


College 
Graduate 



Fjesch's interpretation tables once again provided for the cur- 
vilinearity in the grade level scale. In addition, the Reading Ease 
formula, being simple to apply, serves as a good illustration of the 
next point. 

Trend to Increased Emphasis on Ease of Use 

Danielson and Bryan (1963) were the first of many authors to 
develop a computerized readability formula to aid users in large 
scale applications. To make their programing simple, they used the 
variable of characters instead of syllables in their word count, and 
characters instead of words for their sentence count. 



• I C Formative Years 



DB#2 = 131.059 - 10.364cpsp - .194cpst 
DB#2 = score on a scale from 0 (hard) to 100 (easy) 
cpsp = characters per space 
cpst = characters per sentence 

Syllables are much harder than characters to count by com- 
puter, but syllable counters have been developed by at least six au- 
thors. Similarly, computer programs have been developed for a 
large number of formulas. Schuyler (1982) published one of the best 
wide range programs in its entirety so that potential users could 
copy it. The program will handle nine different formulas for users. 

Another significant move toward ease of usage is the Read- 
ability Graph developed by Fry (1963). The Graph permits ^ direct 
estimate of reading grade level on entering with syllable lengti: and 
number of sentences per 100 word sample, thus providing another 
way of avoiding the manual use of a formula. It seems safe (o say 
that in its most recent version (1977), Fry's Graph is the most 
widely used of all readability methods. The development of a hand 
calculator for it, and the surprising development of a parallel com- 
puter program for it, attest to its popularity. 

Formulas for Languages Other than English 

Work in the development of readability formulas for the En- 
glish language began much earlier than similar work for other lan- 
guages. Spaulding's (1951) formulas for Spanish as a second 
language were the first to be published. Following is the more com- 
monly used of the three formulas eventually published. 

Difficulty = .I609(asl) + 33.18(d) + 2.20 
asl = average sentence length 
d = density, based on a Density Word List 

Despite the later start, much research on the readability of 
other languages has been published since 1951. Research workers 
have written at least eight books and have developed formulas for at 
least the following languages other than English. 

Afrikaans Hebrew 
Chinese Hindi (a modified American 

formula) 
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Danish 



Korean 
Russian 



Dutch (several formulas) 
Finnish 



Spanish (many formulas) 

Swedish 

Vietnamese 



French (several formulas) 
German (several formulas) 



Details of development work outside the United States can be 
found in Rabin's chapter in this publication. 

Cloze Procedure as a Criterion for Formula Development 

Cloze procedure- the deletion of words in text at stated inter- 
vals (usually every fifth word), which readers are asked to fill in 
correctly -can provide a good index of comprehensibility. This 
characteristic makes it a good potential criterion for the develop- 
ment of readability formulas. Though the cloze procedure was de- 
veloped by Taylor in 1953, it was not until 1965 that Coleman first 
used it as a criterion He developed four formulas. The one below, 
using the two variables found in so much recent research, yielded a 
multiple correlation of .89 with cloze criterion scores (adding more 
variables raised the correlation very little). Even more striking, the 
formula's cross-validation value reached .88. 

c% = 1.16w + 1.48s -37.95 

c% - percentage of correct cloze completions 

w = number of one syllable words per 100 words 

s = number of sentences per 100 words 

Cloze procedure has several characteristics that soon made it 
a very popular criterion for formula development. It is objective in 
scoring, easy to use and analyze, uses the text itself as the test, and 
yields higher correlations than the McCall-Crabbs Lessons in com- 
parison of the same formulas (Miller, 1972). 

The popularity of cloze procedure was further enhanced 
when Coleman (1965; Miller & Coleman, 1967) and Bormuth 
(1969) developed extensive sets of passages scored in terms of cloze 
percentages correct. These passages have been used by others in 
developing and cross-validating iheir own formulas. 
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Criticism of Formulas in Terms of Developmental 
Criteria and Grade Level Scores 

As noted earlier, McCall-Crabbs Standard Test Lessons in 
Reading proved to be a popular criterion for formula development, 
and such freo^uently nsed formulas as the Dale-Chall and Reading 
Ease were based on thtm. Consequently, questions about the appro- 
priateness ol' the Lessons could inevitably raise questions about 
many formulas. Stevens (1980) raised just such questions. She 
quoted McCall as saying the Lessons were meant only to be practice 
exercises and were not intended for rigorous testing or criterion pur- 
poses. 

What can be done in the fece of such a charge? Research 
workers can take (and have taken) several different approaches. 

1 . Developers can turn to another criterion, such as a differ- 
ent reading test or cloze procedure. Cloze procedure has been the 
criterion of choice since well before the Stevens article. This is not 
to say that cloze is necessarily a perfect choice. Carver (1977) refers 
to it as a **rubber yardstick," since cloze scores reflect both the diffi- 
culty of the material and readers' ability. Kintsch (1979) considers it 
to be actually misleading as a measure of comprehension, arguing 
that it really is measuring redundancy instead. 

2. Developers can restandardize the Lessons. Harris and 
Jacobson (1974, 1979) did this before Stevens' criticism, since they 
felt the earlier norms were out of date. They reported an earlier cor- 
relation of .74 for four variables, but later correlations in the high 
.80s and low to middle .90s, lending some support to this approach. 

3. Research workers can examine comparable formulas based 
on the Lessons and on other criteria, to see how well the scores 
agree. In one such study (Klarc, unpublished), three formulas that 
have the same index variables— word length in syllables and sen- 
tence length in words— were compared. The three formulas and 
their criteria were 

• the Reading Ease formula (Flesch, 1948), based on the 
original Lessons norms; 

• the Kincaid version of the Reading Ease formula (Kincaid 
et al., 1975), which used the Gates-MacGinitie Reading 
Test as the basis for its grade levels; and 
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• the Fry Graph (Fry, 1977), where the grade levels came 
(with some adjustment) from publishers' grade level 
assignments. 

The examination was mcde across the range of grades, wiih the fol- 
lowing results, suggesting that the Lessons and the formulas based 
on them may be more robust than Stevens' article suggests. 

• The Flesch-Kincaid and Fry grade level assignments dif- 
fered by no more than one grade in their common range of 
six to sixteen grade levels. 

• Neither the Flescn-kincaid nor the Fry differed by more 
than two grades from the Flesch Reading Ease assignments 
beyond these levels. 

• The three formulas agreed (within one grade level) in their 
assignments for most grades. 

Though this comparison showed a surprising amount of 
agreement, there is still the question of whether the formulas could 
simply be agreeing in giving incorrect grade levels. This question is 
difficult to answer satisfactorily, but Harrison (1980) has made a 
start in comparing the assignments of nine formulas against pooled 
teacher judgments. He used twenty-four first year secondary texts 
and sixteen fourth year secondary texts in British schools and found 
that the teachers' judgments yielded average reading age scores 
(reading grade level plus five) of 1 1 .30 and 13. 14 respectively. The 
two most predictive formulas, the Dale-Chall (1948) and the 
Mugford (1970), differed by half a year or less, up or down, from 
these values. The Reading Ease formula (Flesch, 1948) differed by 
one year; the forcast (Caylor et al., 1973) and smog (McLaughlin, 
1969) differed by about two years; and the Fog Index (Gunning, 
1952) differed by almost three years. More work of this sort could 
help teachers by telling them which formulas are most predictive. 
Another helpful approach would be regression equations permitting 
a user to relate grade levels from one formula to those of another. 

4. Developers can abandon the use of grade level scores alto- 
gether, as recommended by a resolution of the Delegates Assembly 
of the International Reading Association (Reading Research Quar- 
terly, 1981). This procedure has been followed by the College En- 
trance Examination Board (1980, 1982), which uses drp (Degrees 
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of Reading Power) units for both its new test and its readability for- 
mula scores. The formula, a modification of one developed by Bor- 
muth (1969), is presented below. 

R = .886593- .083640 (let/w) + .161911 (dll/w)'- 
.021401 (w/sen) + .000577 (w/sen)^- 
.000005(w/sen)' 
R = readability in cloze units; this score is transformed 
into DRP units using the formula drp = (1-r) x 100 
let = letters in passage X 
w = words in passage x 
dll = Dale long list words in passage x 
sen = sentences in passage x 

The use of drp units obviates the need for grade levels in both 
reading test and readability estimates (except in rough grouping** for 
the selection of appropriate test forms). Careful modification of typ- 
ical cloze techniques plus the use of the Rasch model also make it 
possible to avoid certain of the limitations of the cloze procedure in 
the preparation of parallel test forms. This approach is not without 
problems. Grade level scores, whatever their flaws, are familiar to 
teachers; drp scores by themselves are not. Consequently, reading 
material cannot be matched to a particular reader's ability unless the 
reader hz^ been tested with the drp reading test and the material has 
been analyzed with the drp readability formula. As a solution, the 
College Board has undertaken to analyze children's material with 
the DRP formula as it is published and to circulate a report (College 
Entrance Examination Board, 1982) on all such analyses. This is a 
big undertaking and still necessarily excludes old favorites pub- 
lished earlier; since the formula is too complex to apply easily by 
hand, software has been developed for Apple II computers. With the 
DRP arrangement, certain limitations inherent in degrees of compre- 
hension can be overcome. For example, teachers can assign reading 
material at a reader's tested independent or instructional level and 
avoid material at frustration level. 

The DRP program appears to be a significant step forward in 
matching reading material to readers. But it cannot answer all of the 
criticisms leveled at readability formulas, as the next point shows. 
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Two Wbys of Looking at the \^Hdity of Readability Measures 





Prediction of 
Readable Writing 


Production of 
Readable Writing 


Readability 


Index 


Causal 


V^nables 






Validity 


Correlational 


Experimental 


Check 







Criticism of Formulas in Terms of Writing to Formula 

Formula developers have long warned against the notion of 
writing to formula, arguing that it is at best misleading and at worst 
harmful. Klare (1976) has attempted to put the problem in perspec- 
tive by distinguishing between two ways of looking at the validity of 
readability measures. Figure 1 provides a capsule comparison of the 
two. 

Formulas can play a useful screening role in the prediction of 
readability, where only index variables in language are needed. But 
formulas cannot be used in the production of readable writing, be- 
cause index variables are not sufficient for this purpose. Such use 
would be analogous to holding a match under a thermometer to 
warm a room. For producing readable writing, more variables must 
be considered in both the text and the reader, Lavison (this volume) 
discusses this issue in detail along with the implications for text- 
books and teaching materials. Fry (this volume) raises this issue 
again and discusses ways of writing more readably without misusing 
readability measures. 

The following poi^t returns to the readability measures them- 
selves. 

Need for Improvement of Readability Measures 

In a recent article, Chall (1980) pointed to some educational 
problems surrounding the use of rea''* Jility measures and argued 
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for the need to improve current formulas. Others also have argued 
for improvement, notably Harris and Jacobson (1979). They point 
to the need to include variables other than style difficulty if read- 
ability is to move beyond Herbert Spencer, the English stylist of the 
past century. 

Klare (1976) has looked at the question of validity in a paper 
examining thirty-six experimental readability studies concerned 
with improving text comprehensibility. In nineteen of the studies, 
significant differences in comprehension were found; in eleven, the 
differences were not significant; in six, mixed results were found. 
The following categories of characteristics -28 variables in all- 
were examined in each study with a view to discovering which ones 
increased or decreased the probability of finding significant differ- 
ences: 

» experimental passages and how they were modified; 

* tests and other dependent measures used; 
' descriptions of the subjects and their characteristics; 
' instructions given to the subjects; 
' details of the experimental situation; 
' statistical analyses employed; and 

• results and detailed discussions based on them. 
Figure 2, a slight revision (Klare, 1980) of the model in the earlier 
paper, summarizes the results. 

Such a simple version of the model cannot adequately show 
the nature of the interactions or of the predictions that follow from 
the model, and space does not permit such detail here. It should be 
noted, however, that a number of experimental studies have sup- 
ported predictions from the model. For example, Denbow (1973) 
compared easier and harder versions of two contents, one of higher 
and one of lower interest, with information gain as the dependent 
measure. He found that the easier of both contents produced signifi- 
cantly greater gain than the harder; however, the amount of gain was 
significantly greater on the content that was lower in interest value. 
Fass and Schumacher (1978) tested the effect of motivation directly 
by using two groups, one of which had a special monetary incentive. 
Once again, the easier version produced significantly greater com- 
prehension only under lower motivation. Entin (Entin, 1980; Entin 
& Klare, 1985) provided further evidence of the effect of motivation 
by altering reader interest. She used twelve experimental passages, 
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six shown previously to be of high interest and six of low interest to 
college freshmen. The passages were modified so that one version 
(standard) was at approximately grade twelve (freshman level) ac- 
cording to the Reading Ease formula and one was at approximately 
grade sixteen (difficult), yet with the same content according to 
judges. Both readability and interest resulted in a significant differ- 
ence in cloze scores on the passages, i.e., an additive effect. She did 
not find the interaction effect found by Denbow and by Pass and 
Schumacher because the material was at (standard version) and 
ahow (difficult version) the readers' normal ability levels, so the ef- 
fects of readability and interest were cumulative. 

The degree of subjects' prior knowledge of content also can 
have an effect on whether readability changes will significantly af- 
fect comprehension scores. This was suggested in two earlier stud- 
ies (Funkhouser & Macoby, 1971; Klare, Mabry, & Gustafson, 
1955) but could only be presumed because of experimental condi- 
tions. Both studies seemed to show that as the degree of prior 
knowledge increased, the effect of readability decreased. In recent 
analyses, Entin and Klare (1978b, 1980) showed that a measured 
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degree of prior knowledge had a clear effect. Correlations between 
readability levels and multiple-choice comprehension scores on test 
passages from a published reading test were essentially zero, but 
became moderately positive when corrections were made for prior 
knowledge of readers. Entin (Entin, 1980; Entin & Klare, 1985) 
later varied prior knowledge experimentally in her study of the ef- 
fects of interest and readability and again found a significant effect 
for this variable. The effect was not as clear-cut as with interest and 
readability because of problems in getting a completely satisfactory 
measure of prior knowledge. 

The above variables may interact with readability variables 
and thus play a part in whether formulas overestimate or underesti- 
mate reading difficulty. Can such reader variables be incorporated 
into readability formulas to improve their estimates? Not easily, but 
Kintsch (Kintsch, 1979; Kintsch & Vipond, 1979) has published 
some very encouraging results. Although arguing that he did not 
wish to "present a new readability formula," he reported a "proud 
.97" correlation for the following "formula" (Kintsch, 1979). 

Reading difficulty = 2.83 + .48rs + .69wf + .51pd + 

.23inf + .21c-.10arg 

Reading difficulty = number of seconds of reading time per 
proposition recalled on an immediate free-recall task 
rs = number of reinstatement searches 
wf = average word frequency 
pd = proposition density 
inf = number of inferences 

c = number of processing cycles 
arg = number of different arguments in a proposition list 
In a later study (Miller & Kintsch, 1980), Kintsch found a 
multiple correlation of .86 between the same measure of reading dif- 
ficulty on twenty passages by adding to the above the predictor vari- 
ables of input size, sentence length, short term memory searches, 
and buffer size. His approach provides an interesting combination of 
traditional style variables with newer cognitive variables in achiev- 
ing improved readability estimates, but it does require testing of po- 
tential readers. One can certainly hope that more labor intensive 
approaches such as this, the drp method, or the estimation method 
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described by Samuels and Zakaluk (this volume) will now be used. 
Perhaps a good way to put the matter is Rothkopfs comment (1980) 
that, for many, "practical considerations require the continued use of 
surface readability indicators'" at this time. 

In that case, the following sunmiary suggestions might help 
to keep users of formulas from becoming misusers (Klare, 1984). 

• Remember that different formulas may give different grade 
level scores on the same piece of writing. Though they may 
resemble thermometers in giving index values, they differ 
in that (like most educational and psychological tests) they 
do not have a common zero point. 

• Ix)ok over existing formulas and pick a good one for the 
purpose at hand, but consider all formulas to be screening 
devices and all scores to be probability statements (Mon- 
teith, 1976). 

• Choose a formula with two variables for rough screening 
purposes; having only one variable decreases predictive- 
ness, but having more than two usually increases effort 
more than predictiveness. For critical applications or for 
research, apply more than one formula or try one of the 
newer, more complex formulas. 

• Increase the value of an analysis by taking a large random 
(or numerically spread) sampling. For critical applications 
or for research, analyze the entire piece of writing (a com- 
puter program can be of help). For most books, three sam- 
ples (often recommended) can give an indication of the 
average level of difficulty, b ^t cannot say anything useful 
about variations in difficulty. 

• Bear in mind that formula scores derive from counts of 
style difficulty; therefore, they become poorer predictors of 
difficulty at high grade levels (especially college) where 
content weighs more heavily. 

• Consider again the purpose of the intended reading mate- 
rial; training readers calls for more challenging material 
than merely informing or entertaining them. 

• Take into account other recognized contributors to compre- 
hension; otherwise, formula scores may overestimate or 
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underestimate difficulty. For example, using special inter- 
ests or incentives to get above average motivation can help 
to keep challenging material from being frustrating. 

• Do not rely on formulas alone in selecting reading materi- 
als when this can be avoided. Include judges for character- 
istics that formulas cannot predict and to be sure that 
formulas have not been misused in preparing materials. Do 
not use just any judges— select experts or get more reliable 
opinions. 

• Prepare to shift material aftti tentative placement. Where 
to draw the line between reading material that frustrates 
readers and reading material that challenges them cannot 
be specified easily with or without formulas. 

• Keep formulas out of the writing process itself. If you use 
formulas for feedback, try the writing-rewriting cycle de- 
scribed by Macdonald-Ross (19/ ^): 

Write — > Apply formula — > Revise — > Apply formula. . . . 

No set of suggestions can prevent all formula users from be- 
coming misusers. In the following chapter, Davison discusses this 
issue in greater detail. 
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Assigning Grade Levels without 
Formulas: Some Case Studies 



Readability formulas have been widely used to assign grade lev- 
els to texts on the basis of two text properties— average sen- 
tence length and average word complexity. Since the formulas* 
development in the 1920s and i930s, reading researchers have been 
aware of their limitations for assigning accurate and meaningful 
grade levels (Gray & Leary, 1935). Yet formulas continue to be 
used, particularly for assigning difficulty levels in school textbooks, 
because there are no simple, convenient alternatives that would as- 
sign more accurate levels. 

For the same reason-that there is no obvious alternative- 
formulas continue to be u£2d for another, less justifiable, purpose. 
Texts often are edited to reduce their readability by simplifying vo- 
cabulary and shortening sentences. In the process, comprehensibil- 
ity is not improved, while explicit connections as well as expressive 
and interesting words are lost. This fact about adaptations used in 
basal readers has been noted ana documented many times. Ohanian 
(1987) describes how much of aa interesting story is lost when the 
syntax and vocabulary are simplified to meet the readability level 
assigned to a basal reader. 

Some problems with formulas are reviewed in other chapters 
of this book, and other questions can be raised about the validity of 
using formulas to predict whether a particular text can be read by a 
specific reader oi group of readers. Another problem is that formu- 

Rcsearch supporting ihis chapter was provided under Contract No. 400-8 1-0030, U.S. Office 
of Educational Research, to the Center for the Study of Reading, University of Illinois. 
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las are based on correlailons with factors that cause comprehension 
difficulty, not on the actual causes of difficulty. These issues are re- 
viewed in more detail in Davison and Green, 1988. Since formulas 
do not define the sources of difficulty, they cannot be used as guide- 
lines for writing. The guidelines (including clear organization, ex- 
plicit connectives, and appropriate vocabulary) that have been 
suggested are too general or subjective to be alternatives for formu- 
las (Davison & Kantor, 1982). 

Readability formulas probably will continue to be used until 
some widely applicable alternatives are found. The purpose of this 
chapter is to give case studies of alternative procedures that already 
exist for assigning grade levels. These case studies describe some 
situations in which it is not possible to use formulas and others In 
which reviewers or editors have chosen not to use formulas because 
of their many. drawbacks. From these case histories and other simi- 
lar situations, it may be possible to discover generally applicable 
alternatives to readability formulas. 

Trade books for children 

The term trade books refers to books intended for children to 
read outside of school in their leisure time. Teacheru, librarians, and 
parents often need to know which of these books would be appropri- 
ate (f n terms of reading difficulty) for a particular child cr grc .ip of 
children. Compared with school texts, trade books vary a great deal 
In terms of subject matter, style, presentation, vocabulary, and sen- 
tence structure. Many of the factors that influence whether a reader 
will find iJcU books difficult or easy to understand cannot be mea- 
sured by fciiiiuaas. Rather, a skilled and operienced teacher or 
other authoni> mui^l estiitnatc the age range and reading ability ap- 
propriate for a particuk'- book. 

Specialists in chiUrer^'s literature, or librarians who are fa- 
miliar with children's reaai j^, pr.ferences, can r^^^^' u book analyti- 
cally and judgr accurately it^ piobable difficulty ievei without using 
the word '^nd sentence co'5 .*s that go into formulas. The levels re- 
viewers assign are more relative and flexible than the levels assigned 
by formulas. Reviewers' levels cover a two to three year range for 
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average readers in those age groups. But children in lower grades 
may still like the book if they are very skilled rsaders, while some- 
what older children may enjoy the book if they do not read as well 
as average children in their age or grade bvel. 

Some of the factors reviewers consider are writing style (use 
of unusual words or complex sentence structures), the overall orga- 
nization of the book, and the kind of exposition used. In general, 
children like very clear organization, with the episodes following a 
normal sequence of time or progression of ideas from simple to 
more complex. The characters in a story also influence children's 
responses, since children tend to identify with protagonists of their 
own age or slightly older. 

These fiactors are not the only criteria for judging a book's 
reading level. Much depends on the individual reader. For example, 
poor readers find difficult words a great obstacle to reading, while 
average to good readers do noi have difficulty in understanding texts 
because of such word?. Rather they find unusual, expressive, or col- 
orful words amusing and interesting. An unusual kind of exposition, 
such as one using flashbacks, can be made clear and interesting by a 
skilled author who uses it to heighten suspense or create an atmo- 
sphere of mystery. Young children may like certain stylistic features, 
such as plays on words, that older children might find silly. 

Some confirmation of the accuracy of estimated grade and 
age levels is usually available. Well written books that are appropri- 
ate for a particular level in content and style become successes. 
They are borrowed frequently from libraries and are kept in print P r 
a long time. Therefore, trade books provide a natural laboratory for 
discovering what makes books accessible to children. 

Interestingly, trade books were the basis for the first readabil- 
ity formulas (Vogel & Washburne, 1928) to come into general use. 
Vogel and Washburne tested a large number of children for reading 
ability on a standardized test. They also asked the children for titles 
of books they had read and liked. Vogel and Washburne correlated 
features of these books with the reading level of the readers who 
mentioned them. Similar studies could be done now to define what 
properties of trade books make them accessible and popular. The 
result could be to define operationally how reviewers assign grade 
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levels. This procedure could give some insight uito the features of 
texts that make books comprehensible to different age groups. 

Trade books are an important example of a situation in which 
readability formulas cannot be used accurately to judge text diffi- 
culty. Trained, experienced adult readers weighing a number of rela- 
tive factors can successfully assign grade and age ranges to trade 
books. Furthermore, it is possible to validate these subjective judg- 
ments when a book becomes a favorite with children at a certain 
grade level. 

Science magazines for children 

Like trade books, magazines for children on such topics as 
nature, science, and exploration are a natural laboratory for discov- 
ering what features need to be considered in writing readable texts. 
Since editors and publishers of these magazines want children to un- 
derstand and enjoy the articles, they must decide how to present in- 
formation in an appropriate way for the intended audience. One of 
the problems they face is that articles on scientific concepts tend to 
use complex, technical words. An explanation of a scientific idea 
may have to relate ideas in complex and often long sentences. Each 
of these factors would increase the readability level assigned by 
readability formula.s. perhaps in a way that docs not reflect the ac- 
tual difficulty level of the text. That is, a formula may not be sensi- 
tive to real obstacles to comprehension in one text, while at the same 
time it predicts a high level of difficulty for another text that is actu- 
ally quite clear in most ways. 

To see what alternatives to formulas might exist, we can com- 
pare similar (and equally popular) science magazines. The maga- 
zines are different in that Magazine A does not use readability 
formulas at all, while Magazine B uses formulas extensively for ed- 
iting the stories in each issue. 

Both magazines try to choose stories that have subject matter 
interesting to children. They also try to limit the length of stories 
and to leave out topics that cannot be expressed clearly within these 
limits. They use illustrations to arouse interest in reading the article 
and to focus the reader's attention on important ideas. 
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The basic difference in approach lies in the way articles are 
planned and edited. Magazine B, which uses readability formulas, 
starts with the picture layout of the page. Space is assigned for an 
article of a certain length, which is then written to fit this space. As 
much as possible, the article has to explain or refer to the pictures in 
the already existing layout. If the article has too many difficult 
words, or exceeds a certain level (sixth grade on the Fry formula), it 
must be rewritten. Rewriting often means words are simplified and 
sentences are shortened. The result in some cases is that the article 
cannot be structured to explain the subject matter clearly, and the 
relation between the pictures and the text is not always apparent. 

Magazine A has a policy of not using readability fonnulas. 
The articles are planned to appeal to second and third grade children 
who arc beginning to read on their own, fourth and fifth grade chil- 
dren who can understand somewhat more complex articles, and fifth 
and sixth grade children who can read even more complex stories. 
Writers are given a set of guidelines for presenting the subject mat- 
ter at one of the age and grade ranges between second and sixth 
grade. 

A closer look at the stories in Magazine A shows how the 
writers and editors match the text with the intended readers. The 
fact that readability formulas are not used can be confirmed by look- 
ing at the average sentence length and use of words in the articles. 
The average length of sentences in all the articles docs not vary 
much, regardless of the intended age and grade level. The length of 
the articles themselves does vary, with the shortest ones being in- 
tended for the youngest readers. The use of conjunctions like when, 
if, and because increases in articles for older and more skilled read- 
ers. What distinguishes the levels of difficulty is the content and pre- 
sentation of the subject matter. 

The sc^lections for the second and third grade children arc 
short, rely heavily on pictures, and are usually about a young ani- 
mal. Different episodes present the animal in a way that young chil- 
dren can easily identify with. In a clear time sequence, the article 
describes relationships with parents and siblings and how the animal 
eats, sleeps, is protected from danger, and learns new skills. 
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The articles for grades three to five tend to focus on classes of 
creatures that are different but have some common characteristic, 
such as living in Nvater or sharing a particular habitat. These articles 
emphasize contrasts and similarities as well as the relationship be- 
tween an animal and its environment. Articles of this type build on 
concepts about individual species familiar to younger children, but 
teach the readers to see more abstract gep'^ralizauons about individ- 
uals that do not look alike. 

The articles for children in grades five and si.\ are longer and 
refer to more abstract or complex concepts, including cause and ef- 
fect. Some of the articles introduce the idea of theories and hypothe- 
ses, intended to explain known and observed facts. Throagh reading 
about how scientists form hypotheses to explain natural phenomena 
and how these hypotheses are tested, children learn scientific rea- 
soning and how to evaluate an explanation in relation to the evidence 
for it. 

In all the articles in Magazine A, the topic is made clear in 
the first part of the article, and the presentation of ideas follows a 
clear pattern. In stories for younger children, the sequence is usu- 
ally chronological, without flashbacks. In articles for older chil- 
dren, ideas are presented in a logical order, either following 
temporal sequence or placing cause before effect. What is not found 
in Magazine A, but is common in Magazine B, is the organization 
of information common to newspaper articles. This organization 
places the most important or striking facts first, the next most im- 
portant ones second, and so ^u^ which tends to make the connec- 
tions between ideas less clear, especially to younger readers. 

Publishers of children's magazines face the same problems as 
publishers of science or social studies textbooks. They nlust COhvey 
complex, abstract ideas lo children with limited conceptual knowl- 
edge and reading ability. Readability formulas have limited applica- 
bility to texts on these topics. An alternative set of principles to use 
in writing scientific material for children can be derived by compar- 
ing and analyzing selections found in successful publications that do 
not use readability formulas. These guidelines include attention to 
interest, overall length, organization, and method of exposition. 
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They also include the careful use of appropriate illustrations and a 
choice of topics appropriate to specific age levels. Consequently, 
publishers are able to provide articles that are graded for conceptual 
difficulty and do not exceed the reading capacity of the readers. 
These goals can be accomplished without using formulas for writ- 
ing and editing. 



Languages Other than English 

To estimate the difficulty of a text in another language also re- 
quires devising alternative techniques. In formulas developed for En- 
glish, correlations between comprehension performance and word 
difficulty and sentence length are based on texts written in English 
and do not automatically carry over to other languages. Many lan- 
guages have word and sentence structures different from those found 
in English. Although a formula could be adapted for another lan- 
guage, with changes reflecting what is difficult or complex, the re- 
vised formula could not be used reliably without being validated for 
texts in the new language. Such a procedure requires a substantial 
investment of time, effort, and money. Instead of adapting formulas, 
educators and researchers have tried to go directly to the issues in- 
volved in texi difficulty. What they have done shows anoth'* approach 
to matching students with texts without using formulas. 

In India, researchers are trying to develop reading achieve- 
ment tests for seventh grade students in the major regional lan- 
guages, which are languages in which reading instruction is given. 
It is not possible, however, to construct a test of this kind without 
knowing which texts require a particular level of reading ability. 
One such language is Telugu, a South Indian language used in An- 
dhra Pradesh, one of the states of India. 

It would not be practical to try to adapt an English readability 
formula for use on Telugu texts. The correlation in English between 
familiar, easy words and words of one or two syllables does not hold 
in Telugu. Nouns and verbs may have multisyllabic affixes for case 
or tense endings. The more difficult words generally are not longer 
than familiar words. The sentence structure of this language is more 
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like Japanese than English, and for many reasons the correlation in 
English between long sentences and complex sentences does not 
carry over to Telugu. For example, a sentence with subordinate 
clauses can be the same length as a simple sentence. 

So, to estimate levels of text difficulty, Indian government re- 
searchers are relying on the judgment of teachers who have taught at 
the seventh grade level 2nd are familiar with what kind of texts can 
be read by students who are making good progress in reading. A 
certain number of these texts have been chosen to be tried in pilot 
studies with seventh graders. The texts that best discriminate among 
levels of reading achievement will be used in the final version for 
large scale testing. 

The second example involves a Native American culture 
without a tradition of written language. The Yupik people of Alaska 
are concerned about preserving their language, which is rapidly be- 
ing replaced by English. To assure that new generations have some 
knowledge of Yupik, members of the community are constructing 
materials for instruction and reading practice in the Yupik language. 
They want to write texts with a range of difficulty that can be read 
by young children and others through the upper grades in school. 

It would not make sense in this case either to use or adapt a 
readability formula. The word and sentence structure of Yupik are 
quite different from those of English. The same word stem can oc- 
cur in simple torm or with polysyllabic endings, so that word length 
is not a reliable indicator of difficulty. In English, a sentence like He 
mad^ them a large house has many short words, while in Yupik it 
would consist of a small number of ver>' long words. Even if a for- 
mula could be adapted to take into account these features of word 
and sentence structure, it would be difficult to check its validity. 
Since the language has not been v. ritten previously, there is no body 
of written texts that could be used for this purpose. 

There are also many practical problems. No one \\\ the Yupik 
community is trained in education or research. The community has 
drawn on its members who are fluent in Yupik and sensitive to dif- 
ferent styles of speaking used in that language. To construct materi- 
als for younger children, they may make use of the style used for 
telling stories to small children. For older children, they may use 
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the style used when an adult explains information to another adult. 
Since all of these activities are new and untried, success cannot be 
guaranteed the first time for every attempt. Something must be done 
as soon as possible to preserve a language in danger of rapidly being 
forgotten. By trial and error, members of the Yupik community are 
finding a reasonable approach to the problem of writing their lan- 
guage at various levels of difficulty. In this way, they are able to 
construct a written resource to keep the language alive among 
younger members of the community. 

Conclusion 

These examples have been taken from contexts in which read- 
ability formulas could not have been used, causing researchers and 
others to try to make the best possible use of available resources. 
Even in normal circumstances, there are often situations in which 
the use of formulas would be difficult or inappropriate. People are 
often at a loss when confronted by such situations and go back to 
using formulas inappropriately because it is difficult to find alterna- 
tives. These examples demonstrate that it is possible to assign diffi- 
culty levels to texts without the guidance of formulas. 

This chapter has presented a variety of case studies of situa- 
tions where readability has been estimated by some method other 
than a readability formula. Science writing requires use of technical 
and often difficult words to get across concepts, resulting in unreal- 
istically high readability ratings. Trade books may be assigned inac- 
curate readability ratings because formulas are not sensitive to 
features (such as literary style) that are important in these books. In 
countries with languages other than English, it is difficult and time 
consuming to develop new formulas. When the language has not 
been written before, time does not permit the use of adapted formu- 
las to grade newly created texts. In all of these situations, a tradi- 
tional readability formula based on word difficulty and sentence 
length is either unsuited or unavailable for achieving the desired 
goals. 

These case studies describe some actual situations that are 
not isolated or unusual. A careful examination can lead to the defi- 
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nition and testing of alternative procedures for assigning grade lev- 
els. For as long as there have been readability formulas, there have 
been teachers and researchers who have had strong reservations 
about their accuracy and validity. Because of the lack of alternative 
procedures, the concerns voiced by these critics have had very little 
effect on the use of formulas. If there are no alternatives, formulas 
continue to be used. If formulas continue to be used, with all their 
flaws, it is hard to find alternatives and get them adopted for wide- 
spread use. It is hoped that now progress can be made and workable 
alternatives developed to interrupt this seemingly unbreakable 
cycle. 
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Determining Difficulty Levels 
of Text Written in Languages 
Other than EngUsh 



A discussion of readability measurement in languages other 
than English requires reviewing the work of researchers both 
in the United States and abroad. This chapter emphasizes instru- 
ments and philosophies representative of researchers outside the 
United States and relates their research to what has been accom- 
plished here. The chart at the end of this chapter presents a more 
structured picture of what has been accoinplished globally. 
The major areas of discussion include: 

• A brief history of readability research on foreign language 
text. 

• The validity of using readability instruments to measure 
texts intended for students studying second languages. 

• The development of foreign language readability instru- 
ments in the United States. 

• The development of readability instruments abroad. 

• The use of cloze procedure in the measurement of readabil- 
ity in other countries. 

• Some differing philosophies abroad regarding readability 
evaluation. 

The twofold purpose of this chapter is to acquaint readers 
with the options available for determining the readability of text 
written in languages other than English and to make readers aware 
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of the techniques available for the development of a readability mea- 
surement in any language. 

A Brief History of Readability Research 
in Other Languages 

In his review of the use of readability measures for languages 
other tlian English, Klare (1974, 1984) pointed out that much of the 
early research was conducted in the United States for the benefit of 
English speaking students studying additional languages. 

This process began with Tharp in 1939 and has contniued. At 
least seven formulas or their variations for Spanish text have been 
developed in this country (Crawford, 1984; Garcia, 1977; Gilliam, 
Pena, & Mountain, 1980; Patterson, 1972; Spaulding, 19:1; 
Thonis, 1976; Vari-Cartier, 1981) as well as instruments for Rus- 
sian (Rock, 1970), German (Walters, 1966), Hebrew (Nahshon, 
1957), Chinese (Yang, 1971), and Vietnamese (Nguyen & Henkin, 
1985). Several of the formulas are new, demonstrating an ongoing 
interest in the evaluation of second language text for language study 
and for use with recent immigrants. 

The earliest readability measures developed in Europe were 
based on modifications of the Flesch Reading Ease Formula (1950). 
Kandel and Moles (1958) adapted the instrument to the French lan- 
guage; Fernandez Huerta (1959) formulated a Spanish version. De 
Landsheere (1963, 1970) continued this work, publishing first in 
French and then in German. Douma (1960) and Brouwer (1963) 
used variations of the Flesch formula in the development of mea- 
sures for text in the Dutch language. 

Intensive research on more original instruments began in Eu- 
rope in the early 1960s. In Sweden, Bjornsson (1968a, 1968b, 
1983) abandoned the regression equation in favor of an additive for- 
mula. This technique was later elaborated by Bamberger and Vane- 
cek (1982, 1984) working on German text in Austria and simplified 
for use with English materials by Anderson (1983) in Australia. 

The first original German formula developed in Europe was 
that of Fucks (1955). Sentence length was multiplied by word length 
to yield a difficulty level. This instrument produced results similar 
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to those from the Fry graph but was judged inappropriate for the 
German language, probably due to longer words in German than in 
English. Other European research in German followed (Bam.berger, 
1973; Briest, 1974, Dickes & Steiwer, 1977; Groeben, 1972; Nes- 
tler, 1977). 

In Holland the research of van Hauwermeiren (1972) resulted 
in six new formulas, each using a different combination of varia- 
bles. Two additional investigations followed (Zondervan, van Steen, 
&Gunneweg, 1976; Staphorsius & Krom, 1985). 

Meanwhile, Henry (1975) set out to develop a readability in- 
strument especially for French. Two studies had already been done 
at the University of Lifege. Foucar^ (1963) had shown that the diffi- 
culty levels of popular texts were lower than those of texts rejected. 
De Landsheere (1964) found that, in some cases, a high human in- 
terest score on the Flesch could provoke a rejection of, instead of an 
attraction to, the material. 

Henry felt this reaction raised the question of whether a read- 
ability instrument developed on one certain group of books or sub- 
ject matter could bp applied to evaluate other materials. 

Richaudeau (1973, 1981, 1985) undertook research which 
convinced him that th'^ simplest sentence is not necessarily the most 
easily understood. He, like Kintsch (1979), conducted investiga- 
tions which demonstrated that certain transformations (whether 
long or short) appear to stay with the reader longer. Richaudeau 
proposed an experimental formula that spoke to the syntactic com- 
plexity of the reading material rather than its grade level. 

Major research in the development of instruments for use 
with Spanish .ext has emanated from Venezuela (Gutierrez et al., 
1972; Morles, 1975, 1981; Rodriguez Trujillo, 1978, 1980, 1983) 
and Spain (Ldpez Rodriguez 1981, 1982; Rodriguez Dieguez, 
1983, 1987). Several of the Venezuelan projects were initiated under 
John Bormuth at the University of Chicago. Original investigation 
in Spain did not begin until the 1980s. Research has been ongoing in 
both countries. 

Although English is the basic language used in the United 
Kingdom, a survey of readability in other countries would be in- 
complete without mention of research conducted there. Two major 
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works are those of Gilliland (1972, 1975^ and Harrison (1980). Gil- 
liland's is a more theoretical survey of variables -linguistic, phono- 
logical, and physical. Harrison's research concerns the practical use 
of readability measurement in the classroom. It includes results of 
two government studies related to the suitability of the textbooks 
used by British school children: the Bullock committee inquiry into 
the teaching of English (Department of Education and Science, 
1975) and the Effective Use of Reading Project (Lunzer & Gardner, 
1979). 

The Bullock Report stresses the importance of assessing diffi- 
culty levels of texts to match children to their study materials. The 
result has been increased attention in the United Kingdom to the use 
of readability measurements, with instruction in their use included 
in many pre and inservice teacher education programs. In a survey 
conducted as part of the Lunzer and Gardner project, comr Jterized 
versions of six readability formulas were used to evaluate 125 texts 
from four subject areas. Results were compared with teacher judg- 
ment. The Dale-Chall formula, though time consuming to calcMate, 
was found to be the best overall. Scores yielded by smog and fog 
were higher than teacher estimates. On difficult material, scores at 
the upper levels were hard to interpret. 

The ValidiJy of Using Readability Instruments 
to Measure Texts 

In this monograph and elsewhere, Klare (1974, 1984) has 
discussed the choice of criteria and variables on which to base read- 
ability instruments. Of the more than 250 variables studied, word 
length (alone or in combination with word fr'^.quency) and sentence 
length account for most of the variance in the measurement of read- 
ing materials. 

Laroche (1979) considered the use of the cloze procedure or 
word frequency lists questionable as criteria for establishing formu 
las. A cloze test (Taylor, 1953) is often administered to a criterion 
population and then used as the basis for a readability instrument. 

Laroche noted that results of a cloze test are said to reflect a 
basic intuition about the structure and vocabulary of the target Ian- 
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guage. He saw no such population available in the case of foreign 
language reading materials. Likewise, he considered the use of 
word frequency lists a fallacy, pointing out that the use of a word is 
related to the "language bath" in which one is immersed, and the 
linguistic anibiance of the native speaker would differ from that of 
the language student. 

In discussing the psychosocial dinr.ensions of language acqui- 
sition, Ervin-Tripp (1973) observed that the age at which a second 
language is learned affects the body of meaning acquired. While a 
child's thinking might be more oriented toward personal needs, an 
adult's speech would reflect a more complex development cf knowl- 
edge and skills. This would influence vocabulary usage. There is 
also a difference between the way one mentally processes the lan- 
guage (or languages) with which one has grown up and those newly 
acquired. This would support Laroche's contention that there would 
not be the same (or even similar) intuition for language araong those 
in a nonni'tive criterion populatkOai as in a native one. 

Laroche appealed for greater consideration of linguistic vari- 
ables and, citing Bormuth's work (1970), greater collaboration be- 
tween the disciplines in the study of reading comprehension. 
Recognizing the need for readability measurement in foreign lan- 
guage instruction, he recommended an instrument that would take 
into account cognate count and frequency, sentence length as a re- 
flection of syntactic complexity, and phrase structure complexity. 

In contrast to Laroche and Ervin-Tripp, Schulz (1981) 
claimed that psycholinguists assume that once sound-symbol corre- 
spondences have been established and students are familiar with a 
body of vocabulary and major patterns of syntax and morphology, 
reading in a second language is identical to reading in one's native 
tongue. Schulz quoted no source for this theory, however, and de- 
scribed a study by Clarke (1980) that seemed ♦o refute this stand. 
Clarke administered an esl test to two groups of Spanish speaking 
adults who had been grouped into good and poor readers in their 
native language. Results showed much less variation in their scores 
on the ESL test than on a Spanish reading test, demonstrating the 
leveling effect of the belated acquisition of the second language. 
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Schulz and Laroche were concerned that there be a way of 
avoiding fhistrational reading of literary texts in a foreign language. 
Though aware of Laroche's position, Schulz chose to dismiss it on 
the grounds that the limited research available supported the use of 
similar linguistic criteria for measuring readability in all western 
languages She dealt only minimally with the question of the appro- 
priateness of current evaluative instruments for nonnative speakers. 
In discussing the use of cloze proc' lure, she observed that a foreign 
language student might guess the meaning of a missing word from 
context and not be able to supply the specific foreign word. 

Foreign Language Readability Instruments 
in the United States 

Whether because of the popularity of Spanish language study 
in our schools or the proximity of Spanish speaking countries and 
the resultant immigration of their citizens, several instruments for 
the evaluation of Spanish text have been developed in the United 
States. 

Following the Dale-Chall model (1948), Spaulding (1951, 
1956) developed a formula using the two variables of woid usag^. or 
frequency as measured by a density calculation and sentence com- 
plexity as measured by average sentence length. Word frequency 
was based on the number of words in a passage that did not appear 
on the Buchanan list (1941) of the 1 ,500 most frequently used words 
in Spanish. Spauiding's formula was adopted by inter-American 
groups. 

Spaulding's procedure was later adapted by Patterson (1972) 
for use by religious wo/kers dealing with readers with minimal 
reading ability. Thonis (1976) used Patterson's descriptions of the 
reading skills needed to understand materials in the various catego- 
ries in Spaulding's formula to establish grade levels. Since the pro- 
cedures followed are not clear, great credence has not been given to 
Thonis' research. 

As with English language measures, faster, simpler methods 
of computing the readability of Spanish language materials were 
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sought. At least four studies (Crawford, 1984; Garcia, 1977; Gil- 
liam, Pena, & Mountain, 1980; Vari-Cartier, 1981) were based on 
the Fry Readability Graph (1968, 1977), which uses word and sen- 
tence lengths as variables. 

In all four projects, it was determined that the syllable count 
was significantly higher for a 100 word passage in Spanish tlian for 
a similar one in English, probably due to the fact that all vowels are 
pronounced in Spanish. 

Gilliam, Pena, and Mountain evaluated twenty-two books de- 
signed fcr grades one through three using Fry's original graph. They 
concluded that ,t would be necessary to subtract 67 from the average 
number of syllables for the closest equivalencies between Spanish 
and English. B'^th Garcia (1977) and Vari-Cartier (1981) deter- 
mined that sentence length also should be adjusted. Garcia's crite- 
rion was a basal reading series in Spanish. Vari-Cartier used 127 
samples of Spanish prose materials to develop the frase (Fry Read- 
ability Adaptation for Spanish Evaluation) graph. She suggested 
that the procedures used in developing this graph could be applied to 
languages other than Spanish by adjusting the parameters for mini- 
mum and maximum sentence and syllable counts and readability 
designations. 

Crawford's research (1984) was supported by the U.S. De- 
partment of Education under the Bilingual Education Act. He chose 
as his criterion the Laidlaw series of elementary Spanish texts 
(Pastor ^t al., 1971) after determining that the progression of in- 
crease for average sentence length and numb<r of syllables per 100 
words was more regular in this series than in the nine other series he 
evaluated. 

After exhaustive international correspondence, Schwartz 
(1975) concluded that no adequate measure existed for instructional 
materials at the elementary level in German and she adapted the Fry 
graph to that language. Using samples from a series of West Ger- 
man basal readers dating from the post Worid War II era as the crite- 
rion, she determined that the longer length of German words results 
in a count of from 25 to 37 syllables higher than in English. The 
number of sentences per 100 words, however, was very close for 
corresponding grade levels. 
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Those developing other formulas in the United States for use 
with foreign text also found semantic (word length or frequency) 
and syntactic (sentence length) fiactors to be the most predictive, 
though in the case of Oriental languages a need existed for addi- 
tional considerations. Yang (1971) included character factors in his 
variables. A Vietnamese instrument (Nguyen & Henkin, 1982) in- 
cluded tonal marks, word marks, and hyphens as part of letter 
count. 

Development of Readability Instruments Abroad 

Although research in the United States has dictated the use of 
a limited number of linguistic variables, instruments developed 
abroad have sometimes been more complex. 

French. In France, Henry (1975) developed three formulas, 
an eight variable instrument (he considered it the most valid) that 
wac very complicated and required a knowledge of linguistics; a 
computerized formula with limited practicality; and a formula de- 
signed for manual use by teachers." This last formula took into ac- 
count only three variables: number of words per sentence, number 
of words absent from the Gougenheim et al. word list (1967), and 
first names only used with exclamation points and quotation marks. 

All three instruments can be used on three levels* grades five 
and six, eight and nine, and eleven and twelve, allowing for the 
^valuation of the same text at each grade level. Measurement is in 
terms of percentage on a cloze, with 35-45 percent indicated as the 
comfort zone. Anything below is too difficult; anything above is not 
sufficiently challenging. Graphs eliminate the need for calculations. 

Spanish. Gutierrez et al. (1972), v/orking in Venezuela, were 
responsible for what appears to be the first original readability for- 
mula for use witl. Spanish text developed outside the United States. 
A multiple regression equation with cloze as the criterion, it was 
validated only at the sixth grade level. This research was conducted 
under Venezuela's Ministry of Education in response to the great 
need for a method of matching students and their text materials. Un- 
fortunately, many of Gutierrez's compatriots neither understood nor 
were ready to accept the concept of readability measurement, and 
the procedure was never widely used. 



o R6 

innining Difficulty Levels -> 53 



Since publishers' evaluations of the readability levels of Vene- 
zuelan textbooks are still inadequate, and many teachers have only 
the equivalent of a high school education, the need persists for some 
type of readability instrument. Currently, Rodrfguez Trujillo (un- 
dated) is attempting to develop an evaluative technique that can be 
used to determine both the difficulty levels of educational materials 
and the reading ability of the students using them. A procedure for 
Spanish modeled on Carver's Rauding Scale (1976) is being consid- 
ered. 

In Spain, Ldpez Rodrfguez (1981, 1983) studied twenty-six 
linguistic variables, selecting seven of these for her first formula. 
Among tho3e used was a list of common vocabulary by Garcfa Hoz 
(1953). Her criterion was derived from cloze tests, each adminis- 
tered 10 ten students. Rodrfguez Di^guez (1983) added eight varia- 
bles to his predecessor's list, using twelve in his instrument. His 
criterion was developed from 123 cloze tests, each also administered 
to ten subjects. Currently, he is working on a formula that will ex- 
tend to the end of college in two year intervals . 

Swedish. Many of the aforementioned readability formulas 
are in the form of regression equations. Bjornsson (1968a, 1968b) 
of Sweden was a pioneer in the development of additive formulas, a 
technique in which linguistic factors are simply added together and 
the result compared with a predetermined set of criteria. 

This was not done arbitrarily. Bjornsson worked in several 
languages. In one of his many research studies (1974) he used 100 
texts. Their levels of difficulty were judged by two groups of judges, 
each evaluating half of the same texts. Correlation was quite high 
between the average assessments of the two groups. Based on his 
results he concluded that, contrary to traditional belief, judges' rat- 
ings would be reliable if three conditions existed: (1) they were 
made by a sufficiently large group of persons, (2) the passages were 
relatively long, and (3) the range of difficulty in the text battery was 
wide. The average correlation coefficient for groups of six judges 
was .94, as opposed to an average of .99 for twenty-four or more. 
Bjornsson originally attempted to develop a regression equation for 
his Lix (short fox lasbarhetsindex) readability index in Swedish. Us- 
ing all 100 texts, he derived an acceptable equation based on the 
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calculation of multiple correlation. However, when he divided his 
texts in half and recalculated the equation, he found he would have 
obtained completely different equations and coefficients of validity 
if his study had happened to include the first fifty texts or the second 
separately. Bjornsson concluded that regression equations were 
closely dependent on the composition of the criterion and not suU- 
able as readability formulas and so he turned to the additive method 
(Bjornsson, 1983). After experimenting in Swedish with twelve var- 
iables, he settled on two as the best predictors of text difficulty - 
sentence length and percentage of long words (in this case those 
with seven or more letters). 

German. Variables used in German readability measures 
have been varied and plentiful. Dickes and Steiwer (1977) devel- 
oped several formulas, one with as mary as eight variables. 
Groeben (1972) used the level of abstractness of words to determine 
text difficulty. Nestler's formula (1977) dealt with the conceptual 
levels of words in three categories: (1) generally P.iown words, (2) 
hard words, and (3) rare professional words. 

Bamberger (1973), working alone and eventually wi^h ethers 
(Bamberger & Vanecek, 1982, 1984; Bamberger & Rabin, 1984), 
initiated a project to measure German textual materials by both sub- 
jective and objective means. A "readability profile" composed of 
five nonlinguistic variables -content, organization, print, style, and 
motivation -was used in combination with a series of regression 
formulas. 

The checklist of more than thirty items yielded grade levels 
that could be compared with those given by the formulas. When the 
combination of the language difficulty and the readability profile 
was applied to several hundred books in a cross validation, it was 
demonstrated that in approximately 70 percent of the cases, the 
^rade level yielded by the profile was similar to that resulting from 
the use of the formulas. They felt this was an indication of the use- 
fulness of readability formulas. 

Learning of Bjornsson's additive formula, Bamberger and 
Vanecek elaborated on the technique by adding other linguistic fac- 
tors. They used as their criteria 120 children's storybooks and 200 
nonfiction textbooks that previously had been arranged into grade 



levels through the use of pooled assessment and the application of 
readability formulas. Tables were developed that showed the aver- 
age values of six linguistic factors, plus the calculation of Lix, by 
grade level, for works of fiction and nonfiction in the German lan- 
guage. 

Both these tables and the readability profile were designed so 
that educators could discern which individual variable or combina- 
tion might be causing difficulty. These variables then could be han- 
dled instructionally. Much of the Austrian evaluative procedure has 
now been computerized. 

Danish, Denmar'' also has benefited from Bjornsson's re- 
search on Lix (Jansen, 1987). In the 1960s, newspaper publishers 
became interested in widening the use of newspapers in schools. 
They contacted the Danish Institute for Educational Research to ob- 
tain the level of linguistic difficulty of a number of daily papers. 
Research already had been started by Jesper Florander and Mogens 
Jansen (1966) when a query to Swedish colleagues brought news of 
Bjornsson's studies (1964). Since Swedish and Danish are similar 
languages, the Danes opted to adapt Lix to their purposes. The cur- 
rent Danish readability evaluation represents the sum of the average 
length of meaning (sentence length) and the percentage of long 
words (words of more than six letters). 

Danish researchers see readability measurement as the inter- 
action among three components: linguistic, represented by Lix; vis- 
ual, including typography, layout, paper, and print; and the 
"contents of tht text," defined a.s the personal interest to the reader of 
the contents of the text. These components relate to three levels of 
readers in a 2x3 schema: rebus, those who are either beginners or 
disabled readers; transition, those having reached a degree of read- 
ing competency; t>^d content, those able to choose texts solely on 
the basis of content without concern for external appearance or lan- 
guage. All Danish teaching materials, all children's books, and 
many books for young people and adults have been evaluated since 
1970. Through the ongoing use of Lix, it has been possible to follow 
the development of the linguistic levels of books. Nonfiction won : 
for nine to thirteen year olds have become more difficult since the 
late 1960s, and most children have difficulty reading many of the 
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nonfiction books published for them. The Lix Committee has pub- 
lished three official reports on their efforts (Jakobsen, 1971, 1976, 
1983). 

English, Anderson (1967, 1971), who had previously experi- 
mented with the use of cloze procedure and a readability scale to 
evaluate the readability levels of children's books used in Australian 
schools also became interested in Lix. He developed conversion ta- 
bles for expressing Lix scores in grade levels for English language 
materials. As an outcome of his calculations, Anderson (1983) no- 
:ticed that readability estimates could be obtained by simply calcu! iv- 
ing the average number (or rate) of long words per sentence. He 
called his new measure Rix (rate index). Actually, sentence length is 
still involved indirectly, as the number of sentences and the number 
of long words must be counted and divided. 

Use of Cloze Procedures in Foreign Languages 

There is controversy regarding the use of cloze procedure in 
determining the readability of written materials. This controversy is 
based on the fact that cloze is a subjective evaluation that mirrors the 
language ability and background of information of the person taking 
the test. Also, some researchers feel that multiple cloze passages 
should be developed from each piece of material for the results to be 
valid. For example, a test deleting every fifth word should be pre- 
pared in five versions, omitting a different word each time. Though 
these views are shared by other countries, for want of a better tech- 
nique, cloze procedure is widely used. 

A good example is the extensive research on the use of cloze 
procedure in the measurement of the readability of Spanish language 
reading materials by researchers in Venezuela (Bastidas, Calderdn, 
& Bravo, 1981; Morles, 1975, 1981; Rcunguez Trujillo, 1978, 
1980, 1983) and Spain (Ldpez Rodriguez, 1983). Using Bormuth's 
levels (1971), Morles (1981) discovered that in addition to being ap- 
propriate for the determination of the student's ability to compre- 
hend a text, cloze vould be used as an indication of what percentage 
of the total group could handle the material by determining how 
many students had scored more than 58 percent. Bastidas, 
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Calderdn, and Bravo (1981) found that when cloze tests contained at 
least fifty items, there was a high correlation between parallel forms 
from the same material. 

The latter observation was in contrast to what Derakshani 
(1980) learned when he used cloze passages to determine the read- 
ability level of a Persian text on volleyball destined for the popular 
market in Iran. Derakshani compared scores on five parallel cloze 
tests with those on a twenty-five item language achievement and a 
ten item multiple choice comprehension test. Unlike Bastidas, he 
found that there was not always the same mean score for different 
versions of a particular passage. He was able to demonstrate statisti- 
cally that there was a positive relationship between skill in the use of 
the t^^Jget language and achievement on a cloze passage. These find- 
ings are similar to those of Entin (Entin, 1980; Entin & Klare, 
1985) in the United States. Working with cloze tests based on two of 
the five possible versions, she found that in sixteen of tw nty-four 
comparisons there were significant differences at the .05 level. 

Mikk, Sepp, and Hanson (1973) investigated the possibility 
of using cloze procedure to evaluate the readability of Estonian text. 
Research in which subjects were presented with a progressively 
larger number of words on either side of a deleted one, starting with 
three words, led to the conclusion that it was preferable to oinit 
every seventh word instead of every fifth. Two of their experiments 
indicated that requiring the exact words in the blanks yielded a bet- 
ter indication of the pupils' achievement and mental abilities. Ac- 
cepting alternative answers as long as they fit the content was a 
better indicator of the difficulty of the text. They concluded, how- 
ever, that it was more efficient to count only the exact word correct 
and determined that cloze tests consisting of about fourteen pages of 
a book were needed for accurate evaluation, with three pages rec- 
ommended if all of the words fitting the content were considered. 
Since words were deleted with less frequency, it was felt that the 
percentages for the various instructional levels usually used were 
not appropriate, and further research was indicated. 

Sukeyori (1957) tested the applicability of cloze procedure to 
the Japanese language. Experiments were conducted to determine 
what percentage of material should be deleted and whether it was 
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preferable to delete letters or words. Results showed that a deletion 
pattern of 10 to 20 percent of the words v/as more suitP.ble than dele- 
tion of letters. 

Other languages in which cloze has been used for readability 
measurement include French (Henry, 1973; De Landsheere, 1972, 
1973), Korean (Taylor, 1956), and Vietnamese (Klare, Sinaiko, & 
Stoliirow, 1971). Klare (1974, pp. 95-96) offers additional refer- 
ences for cloze research in some of the above languages, as well as 
for Thai and for foreign speakers of English in Papua New Guinea. 

Differing Philosophies 

As with Kintsch (1979) in the United States, there are those abroad 
who would take other variables into consideration in determining 
the difficulty of reading materials. 

Among them is Richaudeau (1973, 1981), whose research 
convinced him that certain transformations, whether long or short, 
appear to stay with the reader longer. He criticized the validation of 
readability formulas with cloze procedure, pointing out that greater 
ability to complete a cloze test is directly related to the redundant 
material measured by such formulas. Richaudeau did not advocate 
abandoning formulas altogether, but he felt teachers and publishers 
should remember this and realize that the more redundancy there is 
in a text, the less interest it holds. 

Richaudeau stressed the importance of anticipation over 
meaning, observing that we hf i built a network of neurological 
pathways over which our knowledge of certain concepts travel. 
Stimuli from outside reactivate these concepts. He rejected the com- 
plete sentence as a unit, maintaining that the sousphrase— probably 
what we would call the clause —punctuated with a period, semico- 
lon, colon, or dash is the important unit. 

Important aids to the readability of such clauses and senten- 
ces include: 

• The placement at the beginning of a clause of important 
words such as Che subject, verb, and principal adjective. 

• The use of short, common words. 

• The use of anticipated words as cues, e.g., I have 

some which . 
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• The use of an affirmative formula at the begin jing, e.g., 
This is wh> 



• The limitation of number of words separating subject and 
verb. 

Richaudeau's experimental formula measured the number of 
words retained after reading a sentence or clause based on three var- 
iables: the relationship between word length and sentence length as 
plotted on a graph, whether the sentence began with an affirmative 
formula, and the number of words between the subject and the verb. 

Platzack (1974) in Sweden conducted readability experiments 
that included research on how physical, syntactic, semantic, and 
contextual cues influence the difficulty level of written materials. 
He reached the following conclusions: 

• Physical cues, like punctuation marks and short structure 
words, help to oCt off decoding units. 

• A text in which relative pronouns have been deleted is often 
less readable than when these are present. 

• Eye-voice span becomes wider when certain short words 
are present as cues to underlying structure. 

• A sentence in which an adverbial clause is placed between 
the verb and the object of the main clause is more difficult 
to read than one in which the adverbial clause is placed 
after the object. 

Platzack maintained that a sentence with a mean length of ap- 
proximately thirteen words is easier to read than one in which the 
mean sentence length is less than nine v/ords, assuming "the texts 
arc of the same difficulty.** Quoting Smith and Holmes (1973), he 
observed that long term memory can take in new materials every 
third to fifth second only ard deduced that someone who reads 200 
words per minute would be able to read 10 to 17 words during the 
interval when long term memory is locked. A short sentence, there- 
fore, might not make it into long term memory. Readers would ei- 
ther have to wait or to read more than one sentence before they were 
able to store what had been read. 
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Conclusion 

The development of readability instruments for the evaluation 
of foreign language text has been an ongoing process. It began with 
research in the United States on materials used in second language 
learning and has extended to most of the civilized world, first by the 
adaptation of formulas intended for English language text and then 
by original investigation. 

Although research in the United States has tended to narrow 
the selection of variables contributing to text difficulty to semantic 
(word length) and syntactic (sentence length) factors, there are for- 
eign researchers who have seen fit to increase these. A notable ex- 
ception is Bjomsson of Sweden, whose investigations produced the 
additiv-. formula Lix. 

Both subjective judgment and cloze procedure have been 
used extensively abroad in the development of criteria on which to 
base readability instruments. In situations wher^ formulas have not 
been feasible, cloze procedure (often using Bonnuth's levels) has 
been adapted to local needs. 

ihere are those who feel that the variables us^ in the devel- 
opment of readability measures for second language text should dif- 
fer from tliose variables used for one's native tongue. Concern has 
been voiced by investigators abroad who believe, as some American 
researchers do, that such factors as physical, prycholinguistic, and 
contextual cues should be considered when evaluating the difficulty 
level of written text. 

Finally, there appears to be a place for continued in /estiga- 
tion into the factors that affect the comprehensibilty, or difficulty 
level, of textual materials in both foreign languages and English. It 
is apparent, as one studies extant research, that the procedures 
needed for developing a readability measure for any language are 
readily available to anyone interested in going through the process. 
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Representative Readability Formulas and Research 
in Languages Other than English^ 



Author and 
Date of Publication 



Formula or Research 



Observations 



Chinese 
Yang, 1971 



Y = 14.95961 (constant) 
+ 39.077461 X 
(WORDLIST) - 2.4849 X 
(strokes) + 1.11506 
X Cfullsen) 
FULLSEN = proportion of 
words in 5,600 
simple word list 
STROKES = average number 
of strokes per 
character 



Criteria— results of standard- 
ized tests based on 85 pas- 
sages from modem Chinese 
writings administered to first 
and second year Ihiwanese 
high school students. Multi- 
ple correlation of .80 with 
independent (character, 
word, and sentence factors) 
and dependent (comprehen- 
sion) variables. Word factors 
cApiaincd 60 percent of 
variance, character factors, 
50 percent, and sentence 
factors, 12 percent when 
taken alone; 64 percent when 
all three were considered 
together. 



Danish 

Lix Committee 
(Jakobsen, 1971) 



Lix = Ml + Lo 

M 1 = average length of 
meaning, i.e., sen- 
tence length 

Lo = percentage of long 
words, i.e., words 
with more than 6 
letters 

In texts below 3,000 words 
whole text is lixcd. For 
larger texts, there is a "roce- 
ourc for spot checks. 



Based on Bjdrnsson*s Swed- 
ish research and the work of 
an official Danish Lix Com- 
mittee convened in the 1960s 
and still active. 



♦Excludes research in which only cloze procedure was used. 
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Dutch 



Douma, 1960 



Brouwcr, 1963 



van Hauwcrmcircn* 
1972 



Uses Flesch formula, which 
estimates words and senten- 
ces 10 percent longer in 
Dutch than in English. 
Ease = 206.84 - 0.77SW- 
0.93WS 

sw = syllables per 100 
words 

ws = words per sentence 

Vscs average length of words 
and average length of senten- 
ces as indices of difficulty. 
Places the two indices on the 
same footing. 
Ease = 195 - Vj sw - 2ws 

L = 109.549-29.971 x, - 

0.986 x^ + 0.967 x,o 
L =:rcadabil:'y level 
X, = average number of 

syllables per word 
Xft -average number of 

nouns per 100 words 
x,o ^average number of 

auxiliary verbs per 100 

words 



Zondervan, 
van Stecn, & 
Gunnewcg, 1976 



Grade 3: 6.44x: + 5.42 
Grade 4: 6.58x, + 40.68 
Grade 5: 5. 76x, -2.86x2 + 
45.67 

Grade 6: 6.07x, - 3.20x2 + 
41.64 

X, = percentage of different, 
difficult long words 

X: = percentage of auxiliary 
verbs 

Difficult long words arc 
those with more than 3 
syllables not on list of 35 
most frequently used Dutch 
long words. 



Generalized from five texts. 
Flesch*s two coefficients arc 
reduced by 1 1 percent. 



Criteria developed from 
study of 25 children's books. 



Validity = .65. 
Five other formulas exist, 
with validities from .60 to 
.67. Criterion -cloze. 



Criteiia- cloze. Based on 
nonfiction texts for grades 3, 
4,5,6. 
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Author and 

Date of Publication Formula or Research 



Observations 



Staphorsius & 
Krom, 1985 



trench 
Tharp, 1939 



Kandel & 
Moles, 1958 



Index for manual computa- 
tion. 

0.798 - 0.329 gwlg - 0.004 
G2LG + 0.588 WW + 0.472 

SUB + 0.654 PERS 

GWLG = mean length of 

words in syllables 

GZLG = mean length of 

sentences in sylla- 
bles 

WW = proportion of verbs 
SUB = propction of sub- 
stantives 
PERS = proportion of per- 
sonal pronouns 
Index for machine computa- 
tion. 

1.284 -0.15 GWL- 

0.010 G2W 

GWL = mran length of 
words in letters 

G^w = mean length of 

sentences in words 



Criteria— cloze. 
Recommended for nonfiction 
texts ingradt:;3, 4, 5,6. 



Proposes an Index of Diffi- 
culty in which the frequency 
index of a piece of reading 
material is divided by the 
density. Density is obtained 
by dividing the number of 
ninning words by the num- 
ber of burden words. 

Adaptation of Flesch Read- 
ing Ease 

Ease = 10? - i.0i5ws - 
0.736SW 

ws = number of words per 

sentence 
sw = number of syllables 

per 100 word*i 



Contrasts burden words with 
gift words, i.e., cognates and 
proper nouns. Stresses value 
of basic word lists for au- 
thors of second language 
texts. 



Because French words on the 
average arc longer than 
English words, the coeffi- 
cient for sw is divided by 
1.15. Counting procedure 
not adapted to French. 



81 



Rabin 



Dc Landsheerc, 
1963 



Dc Landshccrc, 
1966 



Henry, 1973 



Richaudcau, 1973 



FIcsch formula with coeffi- 
cients unchanged but ways of 
counting specific to the 
French language. 

Lexical base from Verlde 
word frequency list. Syntac- 
tic base a function of the way 
punctuation divides text. 



Three sets of formulas 

1. 8 Variable version requir- 
ing a knowledge of lin- 
guistics. 

2. 5 Variable version ideal 
for computer. 

3. 3 Variable version using 
number of words per 
sentence 

number of words absent 
from the Gougenheim 
vx>Td list 

first names used alone + 
exclamation nwrks + 
quotation marks 
Graphs available for use with 
third formula. 
Number of words retained 
from a clause = A + B + C 
A = Score from plotting 
intersection of average 
number of letters per 
word & average num- 
ber of words on a 
graph. 

B = Addition of 2 points if 
sentence begins with an 
affirmative formula 
(This is why, etc.). 

C = Subtraction of 2 points 
when distance between 
subject and verb is 
more than 10 words. 



Flesch formula computer- 
ized. 



Technique abandoned be- 
cause of inconsistencies 
among authors in use of 
punctuation. Replaced by a 
more economical method 
using the Gougenheim word 
list. 

Criteria-5 parallel cloze 
tests each from 60 books 
level" I 'i^jough 12. Primary 
levels climin?' i later. Three 
formulas in ebch set, one 
each for levels 5-6, 8-9, and 
1 1-12. interpolation possible 
for other levels. Not rccom* 
mended for primary levels. 



Experimental formula based 
on the idea that the more 
readable a text, the more 
easily it is retained. 
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Author and 
Date of Publication 



Formula or Research 



Observations 



German 

Fucks, 1955 



Walters, 1966 



De Landshcere, 
1970 



Driest, 1974 



Schwartz, i975 



Difficulty level = sentence 
length X word length. 
Based on a difficulty sca!e. 

Y = 801. 12 -40.77 (F„)-. 

172.32 (F,.3) 

Y = difficulty index with 

range from approxi- 
mately 100 (y/^ry 
difficult) to 550 (very 
easy) 

Fjj, = average number of 
verb segments per 
sentence 

F„j = density of modifica- 
tion in nominal units 

German version of Flesch 
formula using same princi- 
ples for counting as De 
Landsheere*s French lan- 
guage version. 

Verb intensity = number of 
words divided by number of 
sentences. 

German Readability Graph 

similar to Fry Graph. 
Along horizontal axis, 
number of syllables per 
100 words ranges from 
125-189. 

Along vertical axis, num- 
ber of sentences per 100 
words range from 2.0- 
20.0. 

Predicts to grade 8 + . 
Shows little difference be- 
tween matc.-*als for grades 3 
and 4. 



Judged inappropriate for 
German probably due to the 
frequency of long words in 
that language. 

Criterion — subjective judg- 
ment by author and 38 others 
of 15 300-word theological 
texts. Special purpose for- 
mula for use with ti;eological 
literature in German. 33 
formulas or their variations 
developed. 



Never used practically. 



Criteria - 100 word samples 
from West German readers 
of postwar era for grades 1- 
8; 15-21 samples for grades 
2-8; and 11-15 samples for 
grade one. 

Average number of syllables 
per 100 German woixls 
greater than English by 25- 
37. Number of sentences 
close for corresponding 
levels. 
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Nestler, 1977 



Dickes & 
Steiwer, 1977 



Bamberger & 
V^necek, 1982, 
1984 



Conceptual level of words 
used to estimate difficulty 
level of texts. Three levels 
identified: 

1 . Generally known wx)rds. 

2. Rare words. 

3. Rare professional 
words. 

Developed 8 variable, 6 
variable, ar.d 3 variable 
formulas, llircc variable 
version is similar to Flesch 
instrument. 

Developed a number of 
formulas, most requiring use 
of word list of 1,000 most 
common words in written 
language of a German speak- 
ing 10 year old. Exception is 
4.WSF (fourth Viennese 
formula for nonfiction). 
Grade level = 0.2656sl + 
0.2744ms - 1.6939 
si = sentence length 
ms = multisyllabic words 
Also devised subjective and 
other objective methods for 
evaluation of fiction and 
nonfiction. Subjective = 
checklist for evaluating 5 
nonlangu?ge variables: 
content, organization, print, 
style, and motivation. Objec- 
tive = separate profiles of 
language variables for fiction 
and njnfiction using additive 
techniques. 



Too diHcult to use with 
complete passages. Only 
feasible with single sen- 
tences. 



Multiple correlation with 
cloze scores on 60 German 
texts of .91, .89, and .87 
respectively. 

Criteria - 120 children's 
story books and 200 nonfic- 
tion juvenile books arranged 
into grade levels through use 
of pooled subjective assess- 
ment and by applying read- 
ability formulas used in 
development of profiles of 
language variables. 
Correlation with criterion of 
4.WSF .9724. 
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Author and 
Date of Publication 



Formula or Research 



Observations 



Hebrew 

Nahshor, 1957 



GS = 



GS = 



X, = 



x> = 



.236x, + .1338x2- 
3.305 

grade level at which an 
Israeli student can 
comprehend a passage 
without assistance 
percentage of different 
hard words 

average sentence length 
in words 



Eight readability formulas 
developed for Hebrew prose. 
This is shortest with correla- 
tion of .868. 



Hindi 

Bhagoliwal, 1961 



Applied Johnson (1930), 
Flesch Reading Ease (1948), 
Farr-Jenkins-Paterson 
(1951), and Gunning (1952) 
formulas to 31 short stories 
in Hindi. 



No Hindi word lists availa- 
ble, therefore limited to 
formulas involving syllable 
counts. Found Farr-Jenkins- 
Raterson best as it does not 
involve count of polysyllabic 
words, a problem in Hindi. 



Korean 
Park, 1974 



Multiple regression equation 
with five variables: easy 
words, different words, 
different haid words, simple 
sentences, and pronouns, 
leared to materials for 
grades 2-9. 



Criteria graded language 
and social science books 
required for Korean schools 
jy the Ministry of Educa- 
tion, Formula found more 
predictive for samples at 
Iov."er giade levels. 



Russian 
Rock, 1970 



Readability graoh based on 
the compilation of vocabu- 
lary item:, appearing in at 
least half ot (he Russian high 
school textbooks used in the 
U.S. and the percentage of 
unknown words that wil! 
result in the understanding or 
misunderstanding of authen- 
tic Soviet written text as 
demonstrated by research. 



Preliminary study showed 
the acquisition of vocabular* 
is more difficult for Ameri- 
can students in Russian than 
in German or Spanish. 
Result is slower development 
of proficiency in reading 
authentic materials in Rus- 
sian than in the other two 
languages. 
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Soanish 

Spaulding, 1951, 
1956 



Fernandez Huerta, 
1959 



Gutierrez, 1972 



Patterson, 3972 



Thonis, 1976 



Difficulty = 1.609 (ASL) + 
33!.8(D) + 
22.0 

ASL = average sente:'ce 
length 

D = density, based on the 
1 ,500 word Buchanan 
word list 
Graph also available on 
which to plot variables. 
Rating is on a scale of from 
20 (exceptionally easy) to 
200 (exceptionally difficult). 

Adaptation of Flesch Read- 
ing Ease. 

Fase = 206.84 -0.60P - 
1.02 F 

p = number of syllables 

per 100 words 
F = number or sentences 

per 100 words 

Readability =95.2- 

9.7(L/w) - 
.35(w/s) 
L = number of letters 
w = number of words 
s = number of sentence^ 
Validated at sixth jx^de level 
only. 

Elaboration of Spaulding 
formula t'" help religious 
workers simplify written 
materials for readers with 
minimal reading skills. 

Established grade levels for 
Spaulding's formula based on 
Patterson's descriptions of 
the reading ability needed to 
understand materials yielding 
various indices on the 
Spaulding scale. 



Reliability for this formula is 
.87. More complex earlier 
one exists. This version 
widely used in Latin 
America. 



Tried Kandel & Moles 
Flesch adaptation for French 
first Out found it had limited 
application. 



Criteria- results of cloze 
tests administered to students 
in grade 6. Score yielded is 
in terms of average percent- 
age of ansv.'^rs to a cloze test 
which students at xh's level 
would get on the passage 
bein^. evaluated. 



Shortage of substantiating 
evidence makes grade level 
equivalency questionable. 
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Author and 
Date of Publication 



Garcia, 1977 



Gilliam, Peiia, 
& Mountain, 
1980 



Vari-Cartier, 
1981 



Formula or Research 



Observations 



Adaptation of Fry Readabil- 
ity Graph modifying hori- 
zontal and vertical axes to 
reflect differences in syllable 
and sentence length, fgas 
(Fry Graph Adapted to 
Spanish). 

Adaptation of Fry Readabil- 
ity Graph •staining count for 
average number of sentences 
and subtracting 67 from 
average syllable count before 
plotting it on the graph. 



Adaptation ox Fry Readabil- 
ity Graph increasing syllable 
count, altering sentence 
count, and changing read- 
ability designations to reflect 
the four general levels of 
second language study: 
beginning, intermediate, 
advanced intermediate, and 
advanced. fr\se (Fry Read- 
ability Adaptation for Span- 
ish Evalua(ion). 



Criteria— basal reading 
series in Spanish. 
Geared to English as a sec- 
ond language. 



Criteria - 13 textbooks and 9 
juvenile books written in 
Spanish for use in grades 1- 
3; publishers' grade level on 
English version available and 
assum d to be on same 
readability level as Spanish 
text. IS books had same 
readability in both lan- 
guages. Suitability of graph 
for primary materials only 
evaluated. 

Criteria - 127 samples from 
66 American textbooks for 
Spanish language instruc- 
tion. FRASE graph designa- 
tions correlated with 
subjecti ^teacher judg- 
ments, Spaulding formula 
ratings, cloze test score*;, and 
informal multiple choice 
tests in a range of from .91 
to .97. 
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Ldpez Rodriguez, 
1981, 1982 



X, 

X4 



Index of Difficulty = 
95.4339 -0.0756X. + 
0.2012x,-0.0669x,6- 
0.0728x,9-35.202xj,- 
1.0601 Xj2 + 0.7783x26 
commas 

period andnewpam 
graph 

words per sei.^c nee 
x,9 = words of more than 3 

syllables 
X2, = measure of redundancy 
X22 = Common Vocabulary 

of Garcia Hoz 
X26 = expanded list of 
Spaulding 

RodiiguezDieguez, Index of Difficulty =59.929 
1983 -0.098X, -0.321X, + 

4.428 log (X4) +0.108x„ 
+ 0.200x,2- 7.079 log 
(xj- 25.816 log (X2,)- 
0.007 (x22)2-0.012xj5 
0.126x^7 



5.502x„ 



- 70.420x28 + 



X, = commas 

X2 = semicolons 

X4 = period and new para- 
graph 

X,, = proper names 

x,2 = numerals 

X|6 = words per sentence 

x., = measure of redundancy 

X22 = Common VocabuLry 
of Garcia Hoz 
= personal pronouns 

X27 = total of periods 

X28 = deviation of the distri- 
bution of letters per 
word 

X3, = mean of letters per 
v'ord + 2.58 devia- 
tions 



Criterion-cloze. 
Multiple correlation with 
criterion of 5618 
Developed iwx) more formu- 
las for grades 7 and 8 sepa- 
rately. 



Criterion -cloze. 
Multiple correlation with 
criterion of .716. 
Added 8 variables to the 26 
studied by L<3pez Rodriguez 
in search for his formula. 
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Author and 

Date of Publication Formula or Research 



Observations 



Crawford, 1984 



Readability graph based on 
the following regression 
equation 

Grade level = [number of 
sentences per 100 words x 
(-.205)] + (number of sylla- 
bles per 100 vr X .049) 
-3.047 



Criteria— 186 passages from 
Laidlaw series of Spanish 
readers, grades 1-6. Average 
sentence length and number 
of syllables per 100 words 
tabulated, their mean and so 
calculated, and a multiple 
regression analysis of the 
data performed. 



Swedish 

Bjornsson, 
1968a, 1983 



Plc-^zack, 1974 



Lix - average nu nb^r of 
words per sentence + per- 
centage of long words 
Long words = those with 
seven or more letters 
Twenty 100 word samples 
each for word length and 10 
samples for sentence length 
recommended for sd of only 
1 .0. Rating on scale of from 
20 (very easy) to 60 (very 
difficult); converted into 
grade levels for some lan- 
guages by other researc!.ers. 

Studied influence of punctua- 
tion marks, use of word 
order, relative pronouns, and 
sentenre length on readabil- 
ity. 



Vietnamese 

Nguyen & Henkin, 
1985 



RL = 2WL + .2SL - 6 

WL = average word length 
SL = average sentence 
length 

Compound words counted as 
one word. Each tonal mark, 
word mark, and hyphen (in a 
compound word) counted as 
a letter. Readability table and 
scale available for easy 
computation. Readability 
given in terms of grade 
levels. 
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Criteria generally used for 
Swedish and other lan- 
guages-approximately 100 
varied passages whose diffi- 
culty has been rated by 20-30 
persons. Validity of .95 on 
text evaluated when recom- 
mended nrmber of samples 
used. 



Research based on generative 
transformational grammar. 



Criteria— 20 passages of 
approximately 300 words 
each from Vietnamese nov- 
els, magazines, and text- 
books from grades 4 through 
college. 
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Edward B. Fry 



WriteabWty: The Principles of Writing 
for Increased Comprehension 



eadability formulas are concerned with judging the difficulty 
levels of writing. Frequently, textbooks, cr nsumer contracts, 
and a wide variety of reading matter are found too difficult or un- 
readaole for the intended audience. Writeability is concerned with 
writing, rewrinng, or editing to gee those materials to the desired 
readability level. 

Writeability helps writers and editors produce materials that 
can be comprehended more easily without cheating. Cheating is de- 
fined as trying to beat the formulas by artificially chopping senten- 
ces in half and selecting any short word to replace a long word. You 
can cheat on an IQ test to get a higher score, but cheating will not 
change your intelligence. Likewise you can cheat or artificially doc- 
tor writing to get a lower readability formula score, but you might 
noi have changed the tnie readability much and you may have made 
it worse. 

True readability is the goal of most authors. They want to 
communicate ideas to the reader. Increased readability can be dem- 
onstrated by higher scores on comprehension questions, ability to 
fill in more blanks in a cloze passage, fewer oral reading errors, a 
tendency to spend more time reading, more mature eye movements, 
ard subjective judgment of the recder. 

Despite criticisms of readability formulas, their use has never 
ueen more popular. The formulas have had a profound influe^^ce on 
the textbook publishing industry in the past ten years. Most editors 




and sales personnel can tell you the readability score of their materi- 
als. The influence on technical materials has been equally great, and 
readability formulas have influenced the writing of such diverse 
products as army maintenance manuals, insurance poliCiCs, and in- 
come tax booklets. 

Nev^ York, New Jersey, Hawaii, Connecticut, and Minnesota 
have passed Plain Language Laws. California and Michigan are ac- 
tively considering joining the group. In Oklahoma the State Depait- 
ment of Educati i checks ballot propositions for readability. 

Plain Language Laws differ from state to state but their basic 
goal is to simplify all types of consumer contracts such as rent 
agreements, money lending forms, and the fine print of insurance 
policies. The New Jersey law oiates that "a consumer contract shall 
be simple, clear, understandable, and easily readable." 

In 1978, President Carter sent Executive Order 12044 on 
plain language to all gcvernment departments; many states have 
similar movements. 

Basing reading tests on a readability formula is important: 
Tne Degrees of Reading Power now taken by every student in New 
York state and used in Boston and parts of Connecticut is not a norm 
based (standardized) test. Rather, it yields a readability score. In 
other words, it tells the teacher or administrator what books the stu- 
dents can read. It does not compare one student with another, 
though it could be used in that manner. 

The basic idea behind readability has always been to help 
writers, editors, teachers, and librarians to match the difficulty of 
written material with the reading ability of the student. A good 
match iinproves communication and learning. 

This article will discuss how readability formula scores can 
be lowered without cheating. 

Vocabulary 

Since a major input of most readability formulas is vocabu- 
lary difficulty, one way to lower readability scores is to use simpler 
vocabulary. 



Fry 



Writers and editors can use word frequency lists as guides. 
The American Heritage list (Carroll, Davies, & Richman, 1971) 
that ranks 87,000 words found in frequency counts of 5 million run- 
ning words has largely replaced the older but still valuable Thom- 
dike-Lorge lists. Most writers will find these lists too expensive and 
cumbersome. A shorter and more usable list is Sakiey and Fry's 
(1979) list of the 3,000 most used vords in English, both in rank 
and in alphabetical order. The commonness of the variant forms 
(adding s, ed, ness) is given for each word. Even shorter word lists 
like the Doich 220 Basic Sight Words or the first 300 Instant Words 
can be useful to waters of primary material. 

The first rule for writers is to use more common (high fre- 
quency) words. For example, don't "prolong a process" -"keep it 
short.** Take care not to substitute words of Latin or Greek origin for 
common words. For example, proceed often means gOy and secure 
often means safe. Words beginning with pre, disy or multi often can 
be substituted for easier words. 

Much of what is commonly called jaigon or gobbledygook is 
simply someone using large words to sound pretentious or self-im- 
posing. Readability formula makers are aware that there are times 
when the longer word is necessary and should be used. Longer 
words can add precision, clarity, and grace to an author's writing. 
But use too many and yc^ will gel a truly higher readability score. 
Changing the frequency of 15 percent of the words in a sixth grade 
basal reader .Uory, for example, significantly increased reading 
comprehension performance (Marks. Doctorov, & Wit^rock, 
1974). 

A group of sc'ence textbook editors wanted to use the word 
temperature in a primary book and osmosis in an upper level book. 
They argued that it is awkward not to use the correct word, and, 
furthermore, students should learn to read those words. I agreed, 
and a modification of my formula appeared in Publishers Weekly 
(1979), and is restated here. Any term presun».ed to be new and dif- 
ficult for the readers should: 

1. Be defined or used in context the first time it appears in 
such a way that its meaning is apparent and 

2. Be followed for the next three times a appears by the pho- 
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netic spelling or syi abification of the word as found in any 
commonly used dictionary. For example (syl lab i fy) or 
(silab^dfi). 

3. The new term now can be counted as a two syllable word 
regardless of its length. 

4. These new terms should appear in the teacher's guide; as a 
new word list at the beginning or end of a chapter in the 
students' book; and, possibly, in a glossa)7. 

In brief, the author is teaching the term to the reader in a 
helpful, meaningful way. Tm in favor of vocabulary improvement; 
Fm just opposed to bad communication that occurs when the author 
uses words the reader does not know. 

Another important type of word list is based on words known 
by students at given grade levels. The most impressive of these is the 
Living Word Vocabulmy (Dale & O'Rourke, \9^6), which lists 
meanings of words known at different grade levels, tox example, the 
word run has many meanings. At grade four, students know it as a 
baseball word; at grade six, the way a political candidate uses it; at 
grade eight, as the way to nanage a business, anu at grade twelve, 
as a sudden demand. No present readability formula takes this de- 
gree of complexity into account, though some formulas (such as the 
Dale-Chall) do count as unfamiliar or difficult tho.se words not 
known by a majority of fourth graders. Meaning lists are valuable as 
a writer's resource. 

Sentences 

The other major input to most readability formulas is a mea- 
sure of syntactical complexity. To put it briefly, readability formulas 
measure average sentence length. They could measure grammatical 
constructions such as prepositional phrases or subjunctive clauses, 
but most of these measures correlate highly with average sentence 
length. Hence, the obvious instruction to authors is keep your sen- 
tences short— on the average. 

Nothing is more boring reading than a long series of short 
choppy sentences. On the other ha? , nothing makes writing less 
understandable than very long sentences. Variety is necessary to ex- 
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press yourself properly and to interest your reader. Short sentences 
hit hard. Longer sentences subtly suggest that the idea simply can- 
not be expressed without some reservations and qualifications. 

Historically, sentences are getting shorter. Flesch (1974) re- 
ports that Elizabethan sentences averaged 45 words, while Victorian 
sentences averaged 29 words. He found that sentences in magazines 
such as Time and Reader's Digest averaged 18 words in 1949 and 15 
to 17 words in 1974. Flesch estimatp^s that there was a 10 percent 
shortening in sentence length in twenty-five years, a rather hefty 
shrinkage. The American Press Institute conducted a Reading Com- 
prehensiuri Survey of 410 d?ily newspapers and found a strong cor- 
relation between words per sentence and reader comprehension. In 
most cases, the shorter the sentence, the more easily it was under- 
stood. 

Just shortening sentences is not the total answer. Critics of 
readability formulas have pointer out that sometimes longer senten- 
ces communicate better. For example, you might break a long sen- 
tence like Farmer Brown didnt go to town because the roads were 
icy into two simple sentences as Farmer Brown didn't go to town. 
The roads were icy. If you then asked students, "Why didn't Farmer 
Brown go to town?*" you might find that more of them got the answer 
correct after reading the longer sentence. This is why readability 
formulas say that "on the average" sentences should be shorter for 
better communication. They do not say that every sentence should 
be short. 

Some longer sentences are said to add cohesion to writing 
(Kintsch & van Dijk, 1978; Kintsch & Vipond, 1979). An example 
of this is the "because" sentence used earlier about Farmer Brown. 
Other examples of longer sentences thai could add to easier reading 
are if/then and either/or sentences. But in general, the basic read- 
ability principle is that shorter or less embedded sentences are eas- 
ier to read. 

Remember, readability formulas are not mea..t to be writer's 
guides. They are meant to judge the difficulty of a prose passage 
after the material has been written. This article fs a writer's guide. 

There are two more important kinds of sentence complexity 
that readability formulas cannot pick up. The first is the Kernel Dis- 
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tance Theory. It states that splitting thr bject and verb-object (the 
kernel of the sentence) with distance ( words) causes poor read- 
ability. An example of a split kernel would be: Children, if they 
don't wish to get colds, should wear mittens,** An unsplit kernel 
would read Children should wear mittens if they don't wish to get 
colds.*" This theory also states that distance in front of the kernel is 
worse than distance at the end of the kemeL For example, "If they 
don't wish to get colds, children should wear mittens.** Note' chat c\l 
three sentences have the same vocabulary and the same sentence 
length and would get the same readability score, but research has 
shown that split kernel sentences yield worse communication. Other 
writers would call kernel splitting "embedding." 

A second factor, often mentioned in rhetoric books, is that 
active sentences communicate better than passive sentences. Don't 
say -The test was taken by the students." Say, "The students took the 
test." Sometimes, as in this case, the active sentence is shorter. Klare 
(1980) suggests that writers should use active verbs rather than 
nominal izations (verbs made into nouns). For example, the verb to 
sign can be nominalized as signature. This leads to indirect writing 
like "Your signature must be affixed to the form." It is better to say 
**You must sign the form." 

Most of the time, punctuation is helpftil to the reader. A nota- 
ble exception is the overuse of commas. Sprinkling too many com- 
mas in a sentence means that the flow is choppy and may also 
indicate that you have a heavily embedded sentence or one that is 
too long. Too many commas warn the writer or editor that some- 
thing may be amiss, and so does the use of semicolons or colons. 

Paragraphs 

On the aver;»ge, paragraphs should be short. Paragraphs are 
iritended to guide the reader into seeing units of thought, gestalts, or 
schemata. Whatever they are called, they should have some kind of 
psychological units of cohesion. A very long paragraph often con- 
tains too many different ideas; short paragraphs have punch. 

The traditional English textbook admonition about writing a 
well structured paragraph with main idea, supporting details, and 
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conclusions is partly a myth. It might help to train neophyte writers, 
but It simply isn't followed by many professional writers. Take a 
look at Hemingway or any newspaper and you will see plenty of 
short, even one sentence, paragraphs. There are times when you 
will want to use a large, well structured paragraph, but don't be too 
bound by some formal ideal of paragraph structure. 

Another factor related to paragraj^hing is the Ube of lists. 
They are particularly useful in technical writing or in directions. 
Lining up a list of terms or objects is often better than stringing them 
all together in a sentence, separated by commas. This is another 
instance of too many commas being a danger signal. 

Organization 

Selecting the proper organization for an article or a chapter is 
part of the art of being a writer. Some subject matters lend thtm- 
selves more to one form of organization than to another. For exam- 
ple, history often relies heavily on a chronological organization, but 
effective theme or problem centered histories have been written. 

One type of organization effective in expository writing is the 
Statement-Example-Restatement (ser) sequence, ser includes repe- 
tition, giving concrete examples, and restating the principle in an- 
other way. 

Subheads contribute to understanding an article and to in- 
forming the reader what organizatic ^al pattern the writer is using. 
^■»*'^od subheads can ac* a little like Ausubel's advance organizers, 
Kofhkopfs interspersed questions, or Kintsch's discourse pointers. 
Subheads also help the overview and review processes recom- 
mended in the sq3r and other study skill techniques. They are used 
by skilled readers to improve comprehension or retention of the ma- 
terial. 

Clearly written materials use many signal words to indicate 
the author's organization to the reader. Slgnai words caii indicate (I) 
sequence and rank order such first, second, next, last, in conclu- 
sion; (2) that a reverse idea is coming- however, but, on the other 
hand; and (3) that the author is not absolutely ccmin-maybe, if, 
allegedly, might. Other words signal that an example is coming up 
or that ideas might be paralleled or a choice given. 
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Cohesion 

Cohesion refers to how well a paragraph or a passage "hangs 
together" Disjointed sentences or paragraphs indicate lack of cohe- 
sion. Cohesion was illustrated in the "because" sentence earlier to 
show that ideas or concepts sometimes can be communicated better 
in longer sentences. The "because" tied two ideas together to make 
them more cohesive . 

On a connected discourse level (units of prose longer than 
one sentence) the ser is another type of cohesion where different 
parts of the story are repeated and interrelated. Signals words and 
summaries help tell the reader how the article is cohesive. 

A number of current researchers such as Halliday and Hasan 
(1976), Kintsch and Vipond (1979), and Meyer (1977) have been 
concerned with cohesiveness, analyzing propositions (single 
thoughts, ideas, or concepts), and showing how they are related. 
They use terms like links, ties, and networks. In general, they would 
argue that cohesion is aided by more links between propositions. 
Hence, moderate use of referents (though not too distant) is good. 

Traditionally, paragraphs should be cohesive by being about a 
single thought. In modem terminology, paragraphs should be cohe- 
sive by having the propositions well linked which means, "Do not 
jump from idea to idea too quickly." Many different ideas in a short 
passaj,c make readability difficult. Cohesion extends beyond the 
paragraph to entire books. 

Personal Words 

The American Psychological Association condones and, at 
times, encourages the use of personal pronouns in scholarly writing. 
For example, many reports conclude with "It was found that...." 
Who or what is /7? Chances are that it is none other than the author 
who pre^^rs a literary castration rather than the use of a personal 
pronoun, thinking it is more scholarly. Not using personal pronouns 
results in poorer communication and borders on academic dishon- 
esty. What really happened was "I found that...," so why not say so. 
Tkke personal credit or blame, and be a real person to your reader. 

For those who need more convincing, here is a direct quota- 
tion from the aPa Publication Manual: 
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An experienced writer can use the first person and the ac- 
tive voice without dominating the communication and with- 
out sacrificing the objectivity of the research. 



The APA K^nual also cites the American National Standard 
for the Preparation of Scientific Papers: 

When a verb concerns an authors belief or conjecture, use 
of the impersonal passive ("it is thought" or "it is sug- 
gested") is highly inappropriate. When a verb concerns 
action by the author, he should use the first person...." 

However, do not use too many personal pronouns because 
they draw attention away from the subject and toward the author. 

Flesch (1974) developed a formula for measuring interest by 
using personal words and personal sentences. It is not as well known 
as his readability formula, and the concept is not as widely ac- 
cepted. 

Personal sentences are those sentences aimed directly at read- 
ers. For example, "You should always....'' Another type is the sen- 
tence used in dialogue with direct personal reference. For example, 
"Sally said...." 

Personal words and sentences apply not only to adult writing; 
they also are important in children's writing. When sources as di- 
verse as Rudolf Flesch and the APA Manual encourage you to use 
personal pronouns and personal sentences, you should consider do- 
ing so. 

. Imageability 

Imageability refers to the ease with which the reader can vis- 
ualize the word, phrase, or whole passage. Some writers refer to 
this as concreteness. A number of psychological studies, such as 
those by Paivio (1969), have found that highly imageable words are 
easier to learn and remember. For example, dog, bulldozer, and 
mother are high imageable words and is, philosophy, and of are low 
imageable words. In the medium range you might find blue, run- 
ning, and under 
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Phrases, sentences, and whole passages can be ranked for 
imageability. Good writers are adept at finding vivid exam^.^s. Ab- 
stract statistics are exemplified by a description of a typical person 
or product that illustrates the mean or modal findings. Writers of 
children's science books are often ingenious in illustrating basic 
principles of physics with familiar situations. Pastors use stories 
from real life to illustrate the ten commandments, and educators at- 
tempt to make teaching principles more understandable by citing ac- 
tual classroom incidents. 

Metaphors are an attempt to increase imageability. They often 
attempt to give a concrete corollary to a less visual concept. For 
example, **Her personality began to unfold like a rose. It was a hard 
tight bud the first day on the island, but each morning after the sun 
arose it opened a bit more until the full flower of womanhood was 
revealed." 

You can improve imageability by adding appropriate pictures, 
diagrams, maps, and graphs to your manuscript. 

Referents 

Improper use of referents makes some writing hard to follow. 
Referents (sometimes called anaphora) can be pronouns (such as 
they, and tkeir) or phrases. For example, "the old man'' can refer to 
Captain Ahab. Referents are words that must refer to something, 
and that something must be clearly understood, usually by having 
been used in the preceding sentence. 

Writers use referents because they save time; it isn't neces- 
sary to continually repeat the full noun. For example, it is a little 
awkward to keep repeating Vie Lord High Executioner when in the 
next sentence you can say he. 

The misuse of anaphora causes trouble; for instance, when 
the referent can refer to more than one thing. They might refer to the 
good guys or the bad guys and it makes a lot of difference. A greatly 
delayed referent can impose a burden on reader memory and a ro- 
mentary loss of comprehension or story flow. The way out of these 
difficulties is simple— repeat the noun. 

Older lawyers were famous for jargon and greatly delayed 
referents. Early in the document they might use the phrase "herein- 
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after referred to as the party of the first part." Pages later, the reader 
must remember who was the party of the first part. Modem word 
processors have a simple solution. Simply program the word proc- 
essor to insert "Jones" every time you come to "party of the first 
part." 



Motivation and Subject Matter 

One thing readabilit> formulas will never be able to judge is 
an individual student's motivation to read a particular bit of writing. 
Some time ago, I proposed The Readability Principle. Briefly sum- 
marized it says: High motivation overcomes writing that is hard to 
comprehend. What secondary teacher has not witnessed a teenage 
male student with sixth grade reading ability (according to nation- 
ally standardized tests) master a driver's license manual written at 
tenth grade readability? This same teenager has difficulty compre- 
hending his social studies book coupled with reluctance to read it 
for very long. 

Writers who want to be read should find interesting topics. If 
they must write on a difficult topic, they should seek interesting ex- 
amples and applications. 

This goes along with the general injunction to know your au- 
dience. Write directly to someone. Select the proper level of sophis- 
tication, then try to write a little below that level. Best selling novels 
are written at an eighth grade level according to readability formu- 
las, but they are read mostly by high school graduates. The reason is 
not that novel buyers are semiliterate or that they can read only at the 
eighth grade level, but rather as readability formula makers have 
said all along "lower readability scores mean there is an inclination 
on the part of the reader to continue reading the material." If busi- 
ness executives want their memos read or manufacturers want their 
instruction manuals read they need to keep the readability scores 
low. 

Try to be aware of your reader's background knowledge. 
What does the reader bring to the text? You can assume some in- 
tended audiences are familiar with the concept and a brief mention 
is all that is necessary. For other audiences, more explanation is 
necessary. This idea of background knowledge is related to vocabu- 
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Iar> difficulty. Does the audience know the words you are using? 
Even beyond vocabulary, the writer should know where the reader is 
coming from. 



TVyOuts 

There is one sure way to find out if you have achieved true 
readability. Try the writing on a sample of the intended audience. 

Yoa can try to follow all the principles discussed in this ar- 
ticle, you can have the article edited or reviewed by peers or superi- 
ors, but nothing proves readability like a try out. If you are writing 
for children at a certain grade level, get an average student, a 
slightly below average student, and a slightly above average student, 
and have them read it. Do not go to extremes and use very bright or 
very poor students. 

The same is true in writing for adults. If you are writing con- 
tracts, ballot propositions, or even newspaper articles try out sam- 
ples of your writing on a few members of your intended audience. 

Check the comprehension of your try out sample by discus- 
sion and formal or informal questions. If the writing is a set of di- 
rections, see if your sample audience can follow the directions. 

If you haven't communicated effectively, the material needs to 
be rewritten or edited. The ideas from this article are summarized 
on ihe Writeability Checklist. It can help you, particularly in rewrit- 
ing or editing. 

"Easy reading is hard writing" is a principle writers have 
known for a long time. But good writing to an appropriately easy 
level can be improved with practice, and try outs on a real audience 
are an excellent source of feedback. 

Legal Status 

In addition to the Plain Language Laws enacted by some 
states and to the Presidential Executive Order, a number of court 
cases have dealt with readability. For example, a man was blinded 
by a drain cleaner when he did not follow directions written on the 
can. 
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A large scale class action suit brought in Federal Court 
against Medicare (Weinstein, 1984) used a readability formula as a 
pivotal instrument. Thsre are some 50,000 Medicare claims filed 
every day and since Medicare pays an average of only 62 percent of 
the claimed amount, it is not surprising that some 1,500 appeals are 
filed. According to the Fry Readability formula, the notice sent to 
these review seekers v/as found to have a readability ranging from 
twelfth to sixteenth grade. Couple this with the fact that 48 percent 
of the citizens of New York who are age sixty-five or older have an 
eighth grade or less education. Chief Judge Jack B. Weinstein found 
that the notices sent out by Medicare were "incomprehensible" and 
that these "inadequate notices can be remedied. Defendant [Medi- 
care] is directed to take prompt action." 

It Can Be Done 

There is ample evidence that proper writing or rewriting can 
keep the readability of most materials'lower than is commonly sup- 
posed. Recently, a student rewrote part of the New Jersey Drivers 
license Manual as part of a master's thesis. When the readability 
score was lowered two grade levels from eighth to sixth grade, cloze 
scores jumped significantly. A surprising finding was that even 
though the rewritten passages were longer, the students completed 
the cloze tests in less time, indicating that if a passage is easier to 
read it can be read more rapidly (Hunt, 1982). 

The Document Design Center (1982) reports a field test of an 
FCC regulation written in original bureaucratic style and a version 
rewritten to be more readable. Readers' responses were more accu- 
rate and faster when the more readable version was used. 

Another of my students applied a readability formula to a 
front page story in the undergraduate student newspaper and to a 
front page story in the New York Times. He was amazed to find that 
the undergraduate written story was at the seventeenth grade level 
and the New York Times writer wrote at the eleventh grade level. 

Too many people write like that undergraduate, and it simply 
isn't necessary. 
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As an interesting summary of many of the points made in this 
chapter, the reader might like to see some advice to writers given by 
C.A. McKnight, an editor of the Charlotte Observer: 

1 . Use short, simple words. If your writing runs more than 
165 syllables per 100 words, you are v/riting only for 
college graduates. 

2. Use more one syllable words. Make them your work- 
horse words. Make them carry tiie biggest load. Of 275 
words in Lincoln's Gettysburg Address, 196 are only 
one syllable. 

3. Use familiar words. The Bible uses a vocabulary of only 
6,000 words. 

4. Use personal words. Your stories will come to life when 
you sprinkle in a generous supply of words such as youy 
girly mother, doctor, teau.er, Joe, Susie, baby, 

5. Use concrete words -words that make the reader see, 
hear, feel, smell, or taste. 

6. Make every word work. Use fewer words. Use them 
with greater force. Go through one day's writing after 
it's printed. Cross out every unnecessary woixl, confus- 
ing phrase, garbled sentence, involved paragraph. Then 
continue to do that every day, in advance of printing. 

7. Avoid technical words. Nontechnical words are clearer, 
and will build a broader base of readership for you. 

8. Create figures of speech. Build them into everyday writ- 
ing. Feed new ones in as old ones wear out. 

9. Use short sentences. They are the lifeblood of simple, 
easy to read writing. If a sentence runs upward of 30 
words, break it up. Even a one word sentence can be 
forceful, emphatic, arresting. 

10. Make sentences active. Put a taboo on passives. Active 
verbs give action to writing, passives bog it down. 

11. Use short, simple paragraphs. Most should introduce or 
contain only one idea. Most should also have only one 
source or viewpoint. 

12. Write to one person. Write every story or feature as if 
you were talking to one man, to one woman, to one 
child. Picture this person sitting right i front of you as 
you talk. Talk to that person in familiar language, words 
used every day. 
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13. Work with one basic idea. Cover many points, but build 
them on the framework of one idea. That can make even 
a complex subject easy to read about. 

14. Try to write affirmatively. Keep your viewpoint con- 
structive. There is always a yes viewpoint in every no 
situation. 

(American Press Institute, 1985) 

In case you prefer to learn from the negative instead of the 
positive, here is a delightful quote from Law Professor Robert Ben- 
son (1984-1985): 

There exist scores of empirical studies showing that most of 
the linguistic features found in legalese cause comprehen- 
sion difficulties. Legalese is characterized by passive verbs, 
impersonality, nominalizations, long sentences, idea stuffed 
sentences, difficult words, double negatives, illogical order, 
poor headings, and poor typeface and graphic layout. Each 
of these features alone is known to work against clear under- 
standing. 

You might note that of the eleven negative characteristics of 
poor writing, readability formulas take into account only two. This 
is why readability formulas are not writer's guides. 

Instead of a summary, you are invited to look over Appendix 
1, a Writeability Checklist that mentions most of the ideas contained 
in this article. The important point is that the checklist contains 
many more fectors than the two simple inputs of a readability for- 
mula, sentence length and word length, as indicated by number of 
syllables or some other measure. 

For those who do not have a readability formula. Appendix 2 
shows the handy Graph for Estimating Readability. It will give you a 
feirly reliable estimate of the difficulty of any piece of prose, not 
with deadly accuracy, but with accuracy comparable with most 
other human psychological measures. 

Any readability formula is meant to be used after a piece of 
prose is written. Probably the best advice for writers is to write cre- 
atively (some say there is no such thing as uncreative writing) aim- 
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ing it at your intended audience. Then use a readability formula. If it 
comes out on the proper level, quit. 

If your writing is not on the intended level, the Writeability 
Checklist will help you simplify it. Few people have trouble writing 
material which is difficult to read-sentence combining is a skill 
taught in elementary school. It takes art and talent to write In a sim- 
ple, clear manner. Perhaps the Writeability Checklist will help you 
toward that laudable goal. At very least, the Writeability Checklist 
will mollify some of the critics of readability formulas who fear that 
formula makers want all writing to be just short choppy sentences 
and short choppy words. 
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Appendix 1 
Writeability Checklist 



Vocabulary 
Avoid large or infrequent words 

For high frequency words use the Carroll list or 3,000 Instant Words 

For meaning list use Living Word Vocabulary 

Avoid words with Latin and Greek prefixes/roots 

Avoid jargon 

OK to use technical words but see rules for introducing new terms 

Sentences 

Keep sentences short; on the average for general adults keep average 

sentence below 15 wx)rds 

Avoid splitting sentence kernel (embedding} 

Keep verb active (avoid nominalizations) 

Watch out for too many commas (may indicate need for two sentences) 

Semicolons and colons may indicate need for new sentence 

Cohesion sometimes aided by longer sentences 

Paragraphs 

Keep paragraph short, on the average 

One sentence paragraphs permissible at times 

Indent and line up lists 

Organization 

Suit organ izaiion plan to topic and your purpose 

Consider ser (Statement-Example-Restatement) 

Use subheads 

Use signal words 

Use summaries 

Cohesion 

Increase links between sentences and paragraphs 

Avoid too many different ideas in a short passage 



Personal Words 

Use personal pronouns, but not too many 

Use personal sentences 

A direct statement to reader or dialogue 

Imageability 

Use more high imageable words (concrete) 

Avoid low imagery words; edit out many 

Use vivid examples 

Use metaphors 

Use graphs whenever appropriate 
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Appendix 1 
WriteabilUy Checklist (continued) 



Rcfercms 



, Avoid too many referents 

. Replace some referents with the original noun or verb 

. Avoid distance bct\N'een noun and referent 

. Don't use referents that could refer to two or more nouns or verbs 

Motivation 

. Select interesting topics 

. Select interesting examples 

, Write at a level a little below your audience 

.Consider readers' background knowledge 

Try Outs 

. Try out writing on a sample audience 

, Check comprehension by sample audience 

. Revise if necessary 
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Appendix 2 

Fry Graph for Estimating Readability-Extended 
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DIRECTIONS; Randomly Jclcct 3 one hundred v-x)rd passages from a book or an anide. Plot 
average number of syllables and average number of sentences per 100 \w)rds 
on graph to determine the grade level of the material. Choose more passages 
per book sf great variability is observed and conclude ihat the book has un- 
even readability. Few books will fall in gray area but when they do grade le\el 
scores are invalid. 

Count proper nouns, numerals and initializations as umds. Count a syllable 
for C5C^ »1. For example, "1945" is 1 word and 4 syllables and "IRA" is 
1 vrar(j and 3 syllables. 

EXAMPLE: SVLLABI S SENTENCES 



1st Hundred Wor^: 124 6.6 

2nd Hundred Words 141 5.5 

3rd Hundred Words 158 6.8 



AVERAGE 141 6.3 

READABILITY 7th GRADE (see dot plotted on graph) 
Reproduction ftrmitted. No Copyright. 
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Readability: Its Future 
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Marilyn R. Binkley 



New Ways of Assessing Text Difficulty 



As interest in textlinguistics increased during the late 1970s and 
early 1980s, researchers developed new descriptions of text 
structure. At the same time, researchers becaitie increasingly disap- 
pointed with the ability of classic readability formulas to describe 
adequately the features of text that influence comprehension or to 
guide the production of improved text. Together, these trends are 
influencing reading comprehension research. 

The intent of the new text analysis systems is significantly 
different from that of classic readability formulas. The new systems 
attempt to predict ease of reading, to test hypotheses about thought 
processes, and to guide production. Therefore, instead of focusing 
on factors of text correlated primarily with reading difficulty, the 
new approaches try to identify text factors that influence learning 
and memory. 

Many of the new approaches have been developed by research- 
ers in fields other than reading, particularly rhetoric (D'Angelo, 
1975; Flower & Hayes, 1977), linguistics (Fillmore, 1968; Grimes, 
1975; Halliday & Hasan, 1976; van Dijk, 1977), psychology 
(Bartlett, 1932; Dawes, 1966; Frederiksen, 1972; Mandler & John- 
son, 1977; Rumelhart, 1975), and artificial intelligence (Chamiak, 
1972; Schank, 1977; Simmons, 1978). However, they have been use- 
fiil to reading researchers because they make possible new ways of 
thinking about reading. They also have potential for measuring text 
difficulty. 

The opinions and suggestions expressed in this paper are those of the author and do not neces- 
sarily reflect the positions or policies of the U.S. Department of Education. 
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This chapter provides a review of the most significant new 
text analysis systems, describing each type, discussing its useful- 
ness to reading research, and exploring its applicability as a read- 
ability estimate. The chapter concludes with a proposed 
methodology that extends cohesion analysis into an ease of reading 
measure. 

Defining Text and Language Structures 

There are two key attributes in all text analysis systems: the 
unit of discourse and the kind of relationship. 

Researchers have proposed a number of divisions of dis- 
course. For example, van Dijk (1979) differentiates between macro 
and microstructures. Armbruster (1984) looks at global or local co- 
herence. Meyer (1981) identifies three text levels: sentence, para- 
graph, and top level. These differentiations focus on how much of a 
text is being considered, how pieces of text relate, and which types 
of rhetorical structures are used in developing a text. 

Similarly, at least three distinct sets of relationships operate 
within text. Grimes (1975) identifies the structures as content or se- 
mantic, cohesive, and staging. According to Grimes, these relation- 
ships interact in discourse, causing coherent text to form in such a 
way that the theme is selected from already introduced information 
and then related cognitively and thematically to the rest. However, 
each set of relationships may be studied separately and examined at 
either a micro or macrolevel. 

Viewed in this broader context, we can categorize text analy- 
sis systems along the two dimensions noted— the unit of discourse 
and the kind of relationships examined. Figure 1 is a simple model 
of such a categorization. 

The Figure illustrates that, in theory, text analysis systems 
can be devised to include any combination of study units and types 
of relationships. For example, Fre Jeriksen*s semantic networks sys- 
tem (described later) focuses primarily on the content structure at a 
microlevel. Not all such combinations have been fully explored. To 
date, most text oriented research has focused on content structure, 
which may be described at both the macro and microlevels. In con- 
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Figure 1 

Categories of Text Analysis Systems 



Level of Discourse 



micro 



macro 



semantic 



networks 



story grammar 

rhetorical 
structures 



Types 



propositional 
networks 



of 



Relationships 



cohesive 



cohesion analysis 



staging 



trast, research based on the cohesive structure of text has been con- 
fined to the microlevel, due perhaps to a limited understanding of 
these systems. As work by Hasan (1980) attests, cohesion chains 
operate at both micro and macrolevels of text. 

The Content Structure 

Content structure refers to the cognitive structure of text— the 
semantic, or meaning, aspects. It most clearly represents the infor- 
mation and complexities of text. 

Attempts to represent the content of existing passages include 
set relations (Dawes, 1966), linear relationship structures (Frase, 
1973), propositions (Kintsch, 1974), and networks (Frederiksen, 
1975a). These operate primarily on the microlevel. Another ap- 
proach, constituency grammars, attempts to identify functional 
units between the proposition (microlevel) and the passage (macro- 
level). Examples include story grammars and rhetorical predicates 
(Meyer, 1975; Rumelhart, 1975). 

Frederiksen's (1975b) semantic networks and Kintsch's 
(1974) propositional analysis are at the foreground of content struc- 
ture analysis. Both are based on Fillmore's (1968; notion of case, 
which assumes that deep structure or meaning does not vary with 
the surface structure of sentences. For example, consider two sen- 
tences: 
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John opened the door with the key. 

The door was opened by John with the key. 

Although the surface and syntactical structures differ, each of the 
noun phrases retains the same case or role relationship. By examin- 
ing how various cases relate to the verb phrase and plotting this rela- 
tionship, the meaning of a passage can be separated from its surface 
form. 

This example from Kintsch's propositional analysis demon- 
strates how these systems work. Consider the simple sentence: 

Mary is baking a cake, 

In propositional analysis, this would be recorded as 

(Bake A:Mary, 0:cake) 

where A is the agent and O is the object. Any surface structure that 
has Mary baking the cake can be reduced to this relationship. 
Therefore, when comparing various versions of the same story, the 
analyst can determine if the same propositions were represented. 

Meyer and Rice (1984) have created an analysis system that 
combines the micropropositional elements of the Kintsch and Fred- 
eriksen approaches with a macropropositional element. They iden- 
tify five types of rhetorical relations in text: causal, problem and 
solution, comparison, collection, and description. These rhetorical 
structures fecilitate the segmentation of text into a hierarchical form 
that can be used to identify the top level structure of expository pas- 
sages. 

Researchers have used these three approaches to determine 
the effects of variation of structure on students' understanding, 
learning, and retention of content. The approaches work particularly 
well for representing text and comparing protocols to that text. As 
such, they are useful for determining learning in relation to a partic- 
ular text. 

Research using these systems of text analysis has led to find- 
ings that could have an impact on text production. 

• Ideas located at the top levels of a structural analysis of 
prose are recalled and retained better than ideas located at 
the lower levels (Bartlett, 1978; Britton et al., 1979; Du- 
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chastel, 1979; Haring & Fry, 1979; Meyer, 1971, 1975, 
1977; Swanson, 1979). 

• Different items of information located high in the structure 
are more likely to be integrated in memory than items lo- 
cated low in the structure (Walker & Meyer, 1980). 

• The type and structure of relationships among ideas in 
prose dramatically influence recall when they occur at the 
top levels of the structure; however, when the same rela- 
tionships occur low in the structure they have little effect 
on recall (Meyer, 1975). 

• Different types of relationships at the top levels of the struc- 
ture differentially affect memory (Meyer & Freedle 
1984). 

• Students who are able to identify and use these top level 
structures remember more from their reading than those 
who do not (Meyer, 1979; Meyer et al., 1980). 

• Training in how to recognize and use these top level struc- 
tures improves recall for text materials (Bartlett, 1978). 

• Overgeneralizations, pseudodiscriminations, and text gen- 
erated inferences occur at the time of comprehension, 
while elaborations occur during recall (Frederiksen 
1975). 

• Explicit statements of logical relationships facilitate com- 
prehension in poorer readers (Marshall & Clock, 1978). 

However, these systems of analysis do not easily lend themselves to 
determinations of relative reading levels. 

Story grammars, on the o*her hand, have led to experimental 
readability measures. Story grammar is based on the premise that a 
reader understands the organization and elements of a story, inde- 
pendent of the specific content. The story grammar represents the 
important elements in a story and specifies the allowable ways ele- 
ments may be arranged (Black & Wilensky, 1979). 

Many story grammars have been proposed. Almost all de- 
scribe stories as consisting of a setting and a series of one or more 
episodes. Each episode tends to have an internal structure made up 
of a problem/solution or a goal/action/outcome. In addition to re- 
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writing the story as a listing of rules, a reader may redraw a story 
grammar as a tree diagram that illustrates the hierarchical relation- 
ship among the constituent parts (Mandler & Johnson, 1977; Ru- 
melhart, 1975; Stein & Glenn, 1979). As such, the story grammar 
tends to have a top down or macroproposition orientation similar to 
Meyer's rhetorical predicates. 

Meyer and Rice (1984) point out that, in reading research, a 
given story is analyzed so the components of the passage are identi- 
fied according to their role in the story. Stories then can be com- 
pared based on their structure. This has led to some interesting 
fmdings. 

• Recall is easier for a second story with the same structure 
as an earlier story (Thomdyke, 1977). 

• Comprehensibility ratings of stories can be predicted 
(Bower, 1976; Rice, 1978). 

• The sorts of summaries subjects will make of target stories 
can be described (Kintsch, 1977; Kintsch & van Dijk, 
I975;Rumelhart, 1977). 

• The items that will be remembered from a story can be pre- 
dicted (Mandler & Johnson, 1977; Thomdyke, 1977). 

In contrast to researchers aiming at understanding the reading 
process and story grammar acquisition, others have attempted to 
convert grammars into "a quantitative means of predicting the read- 
ability of a story." For example, Templeton and Mowery (1985) ana- 
lyzed stories according to Mandler and Johnson's grammar and 
developed a prediction formula based on the tabulation of different 
types of basic nodes, weighting nodes according to their level in the 
underiying structure of the story. When th^ compared their results 
with the Fry formula, they found no relationship; as difficulty in- 
creased according to the Fry formula, their underlying structure or 
degree of difficulty remained the same across grade levels. They re- 
fined their formula and tested it by having subjects read silently and 
then retell the story. Analyses revealed no significant differences be- 
tween recalls as a function of difficulty levels. Although this effort 
has resulted in a comparable measure of the difficulty of texts, at 
present it does not appear to predict appropriate placement of text 
materials in the same way readability formulas do. 



,9^Assessing Text Difficulty 



103 



The line of research related to the content structure of text has 
been very productive. It has led to a better understanding of the 
comprehension process and to the development of guidelines for text 
production (Armbruster, 1984). However, to date it has not led to a 
replacement for readability formulas. 

772^ Cohesive Structure 

In contrast to content structure, cohesive structure serves as 
the syntax of discourse. It is concerned with the interrelationships of 
ideas (Meyer & Rice, 1984). In effect, the cohesive structure is a 
roadmap to understanding. Although the reader may not understand 
the words specific to a particular field, the cohesive devices form 
the context in which words have meaning. They are "[the] mecha- 
nisms by which authors tie their materials together" (n 325). While 
they are not the content to be learned, the cohesive sCaictures help 
students order and organize new concepts. 

Cohesion, as defined by Halliday and Hasan (1976), occurs 
when the interpretation of some element in the discourse is depen- 
dent upon the interpretation of another. This occurrence of a pair of 
related items forms a tie across the boundaries of sentences. It may 
be achieved through syntactical markers, such as conjunctions, or 
through semantic relationships, such as pronouns. This connection 
of terms across sentences represents a kind of "linguistic mortar*^ 
(Tierney & Mosenthal, 1982) that clearly defines the semantic con- 
tinuity of a text. 

Halliday and Hasan distinguish five classes of cohesive ties 
shown with simple examples in Figure 2. 



Figure 2 
Examples of Cohesive Ties 



Gass 



Example 



Reference 
Substitution 
Ellipsis 



he, that, there 
one, same, do so 



Conjunction 
Lexical 



and, or, later 
kayak, boat 
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Their system of cohesion analysis provides multilevel information. 
It taps the commonly recognized cohesive devices and describes se- 
mantic networks as v/elL These devices generally operate within or 
between adjacent paragraphs -the microlevel. Through the coding 
of lexical items, chains mark the semantic continuity of any given 
passage across a larger level of organization. Regrettably, lexical 
analysis is the weakest point in the methodology because lexical ties 
are much more dependent on subjective judgment and prior expe- 
riential knowledge than the others. However, lexical ties are directly 
related to vocabulary knowledge and as such are a crucial element in 
analyzing comprehensibility. For example, preliminary research us- 
ing this analysis system with college level textbooks indicates that 
substitution and ellipsis are rarely present in expository text, while 
lexical items predominate, accounting for more than 54 percent of 
all ties(Binkley, 1983). 

Using Cohesion Analysis to Assess Readability 

Cohesion analysis as described by Halliday and Hasan re- 
duces text to counts of types of ties and distances. Work by Binkley 
and Chapman extends the applicability of cohesion analysis to as- 
sessments of readability. 

Binkley and Chapman have developed a methodology that as- 
sesses the match between students and text materials intended for 
instruction. Their methodology looks at attributes of written text, 
comparing these attributes to students' development in much the way 
readability formulas were intended. However, it goes beyond cur- 
rent readability formulas. 

• It accounts for more attributes of text than vocabulary diffi- 
culty and sentence length. 

• It qualitatively evaluates reader performance. 

• It provides diagnostic information on individual, small 
group, class, or school levels that could guide the planning 
of instruction or production of text materials. 

The assessment process is intended for use when reading to 
learn is the objective. In such a setting there may be multiple goals 
for instruction, including learning how to learn from text and learn- 
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ing specific • intent. Consequently, it is essential that the text be 
withm a manageable range for student readers. In addition, the 
teacher must know where and when a mismatch occurs between the 
author's assum^sd and the reader's actual abOity so appropriate direct 
instruction may be provided. Tiiis assessment procedure pinpoints 
types of problems students are having. Teachers car. then tailor in- 
struction more appropriately. 

There are essentially three stages to the assessment proce- 
dure: text analysis, design of a modified cloze proreuure to reflect 
the attributes of the particular text, and administration and scoring 
of the cloze procedure. 

What Cohesion Analysis Says about Specific Texts 

Both similar and different attributes of text are assessed with 
readability formulae; and with cohesion analysis. In the fonner, only 
two attributes are ODnsidered: vocabulary difficulty, which is mea- 
sured against lists of femiliar words or by counting the numusj- of 
syllables, and sentence difficulty, which is measured by the averaj^ 
number of words per sentence. 

In cohesion analysis more information is provided about vo . 
cabulary and syntactic complexity. The count of lexical items indi- 
cates the number of repeiitions, the number of synonyms, the use of 
superordinate and subordinate terms, and the use of general classes 
of words. This type of count can be used to gather information per- 
taining to the ways childien acquire word meanings, as well as their 
recognition of particular words. The count of conjunctive and refer- 
ence items helps to assess the number and complexity of syntactical 
forms used in specific texts. 

The pattern of ties that occurs across sample passages from a 
specific text reveals a great deal. For example, although each type of 
tie may be present in all written text, the distribution of ties differs 
among academic disciplines (Binkley, 1983). Science writings tend 
to repeat the same noun while social science writings depend more 
heavily on synonyms and superordinate and subordinate terms. Var- 
iation also occurs in the prevalence of types of conjunctions. These 
differences are neither good nor bad; they do, however, represent 
differences in argumentation style that may have implications for 
readers. 
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While there is evidence suggesting patterns of ties representa- 
tive of each discipline (Binkley, 1983), there are even stronger indi- 
cations that a specific pattern exists within a single textbook. This 
pattern might be called the signature or register of that book. 

Binkley's method for determining a text signature is rooted in 
Halliday and Hasan's (1976) system for counting ties. Samples of 
text are randomly chosen from a textbook (a sample is defined as a 
unit of discourse that begins with a heading or subheading and ends 
at the next heading) and are analyzed with each tie and the distance 
between ties recorded. The distribution of types of ties and distances 
across the samples constitutes the signature or register of the text- 
book. It is this proportional variation and the specific subclasses of 
ties, as well as the number of ties that cohere over longer distances 
(i.e., more than five sentences) that differentiate one text from the 
other. As such, the register includes the specific content vocabulary, 
the relationship between ideas, the argument structure manifested 
by the use of reference and conjunctive ties, the syntax of the pas- 
sages represented by the grammatical relationships of the ties, and 
syntactical markers of the macrostructure. Consequently, a register 
may be considered so distinctive as to constitute a separate genre. 
These factors, along with a reader's prior knowledge, influence the 
comprehensibility of a text. 

The Instrument Design 

Reading is an interaction between an author (who has made 
certain assumptions about an audience) and readers (who may or 
may not have the assumed attributes). Therefore, an assessment of a 
text separate from an assessment of the readers' characteristics can- 
not give a measure of the text's comprehensibility. In designing an 
assessment procedure, the rmphasis should be on gathering infor- 
mation about text in relation to a particular body of students. To do 
so, the assessment instrument should relate the salient features of 
the text with the readers' ability to comprehend. The instrument will 
thus yield information about both the reader and text. 

Measures of reading comprehension have taken many forms 
over the years. Cloze was one of the first techniques used to match 
students with appropriate reading materials. When used in schools 
as an instructional and testing tool for reading comprehension, cloze 
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typically begins with an excerpt from which every nth word Is de- 
leted. Students are expected to fill In the resulting blanks by select- 
ing a word. Since Its Introduction Into education by Taylor (1953), 
much research has been conducted to test the cloze procedure, both 
as a device to measure comprehension and as a measure of readabil- 
ity (Beard, 1967; Bormuth, 1968; McKenna, 1978; Nesvold, 
1972). More recently, research has focused on using cloze as an In- 
structional technique (Jongsma, 1980; Kennedy & Weener, 1973). 
The cloze procedure has been shown to measure the difficulty of a 
text In a manner that Is unlike readability formulas. Word and sen- 
tence length are not the variables considered. Instead, cloze mea- 
sures a reader's response to linguistic variables, the language 
structure of text. 

Shanahan, Kamll, and Tobln (1982) questioned the ability of 
cloze tests to measure the use of Information across sentence bound- 
aries. In their study, they administered three variations of cloze: 
standard cloze passages; the same passages with scrambled sentence 
sequences; and passages constructed by embedding, single sentences 
from the original passages in other, nonsupportive text. They found 
no performance differences due to sentence order or to the presence 
of supportive text. Therefore, they concluded the cloze procedure 
might be limited in measuring the integration of intersentential in- 
formation. They do suggest that "It might be possible to design 
cloze tests to measure this ability." 

In contrast to a standard cloze procedure, the deletions in 
Binkley and Chapman's assessment instrument are based on the sig- 
nature or register of the textbook under consideration. Therefore, 
the assessment instrument tests the language demands of the text- 
book going beyond intra to intersentential information integration. 
When coupled with the grading system, the procedure allows for 
qualitative descriptions of student performance. 

Three criteria are considered in making deletions. First, be- 
cause the distribution of ties in the textbook marks the syntax preva- 
lent in the text, deletions are made to reflect the proportion of ties in 
the textbooks. For example, if pronouns are used 20 percent of the 
time throughout the text, 20 percent of the deletions in the modified 
cloze procedure should be pronouns. In this manner, the assessment 
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instrument highlights the language structures that set this writing 
apart from other writings, because it marks the relationships be- 
tween ideas. 

Second, consideration is given to the pattern of distance be- 
tween ties. Ties are generally between adjacent sentences. They oc- 
cur less often between sentences two to three sentences apart. Rarely 
do they occur between sentences greater than five sentences apart. 
Ties in adjacent sentences reflect micro or locM coherence. In con- 
trast, ties across paragraph boundaries tend to reflect macro or 
global coherence. Therefore, deletions should represent the propor- 
tion of various distances. This dimension gives information about 
the reader's use and understanding of the micro and macrostructures 
of particular texts. 

Finally, consideration is given to making deletions that relate 
to and trace the major chains central to a particular excerpt. Here 
the assessment procedure relates closely to the lexical chains and the 
semantic knowledge necessary for understanding specific content. 
This, too, is an essential element of the macrostructure. 

Scoring Student Responses 

The Binkley and Chapman instrument is administered in the 
same manner as a standard cloze procedure. However, the scoring 
system is significantly different. In a standard cloze procedure, re- 
sponses are right if the replacement word is the exact word deleted. 
The student's score is the number of responses that are the same as 
the original. Based on this number, a book is determined to be at the 
independent, instructional, or frustration level for a particular stu- 
dent. No diagnostic information is provided as to type of errors or 
difficulty. 

In the Binkley and Chapman system, student responses to the 
cloze procedure are placed on a continuum from inappropriate (ijC., 
no relation to the materials) to syntactically correct to syntactically 
and semantically correct (Chapman, 1979a, b, c; 1980; 1983a, b). 
The initial analysis of student responses yields a frequency of re- 
sponse for each deletion. All responses are reported with a count of 
how many students chose each response. The responses are re- 
corded so they are positioned along a continuum. The criteria for 
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Figures 

Summary of Criteria for Allocating Responses 



Pdsition ! (PI) Prcrcading 

• omissions 

• unrecognizable responses 

• response from VPF 

• response unacceptable in one c.au^ element 

PI Transition (response is partially acceptable) 

• achieved by ignoring other words in clause element 

• achieved by overrunning punctuation and combining with wx)rd(s) from fol- 
lowing clause elements 

Pdsition 2 (P2) Beginning Reading (clause structure perceived) 

• word complex responses-acceptable in one clause element only (i.e., all 
other contexts arc ignored) 

• group complex responses-acceptable in clause (complex) but lacking evi- 
dence of cohesion and register 

• clause complex responses -acceptable in clause (complex) but lacking evi- 
dence of cohesion and register 

P2 Transition 

• response shows evidence of cohesion and appropriate register but contains 
errors in lexogrammatical structurc(s) 

Pdsition 3 (P3) Developing Reading Fluency 

• responses indicate that clause structure perceived; cohesion perceived but 
achieved differently from author; possible errors of field mode and tenor 

• responses indicate that clause structure perceived; register is appropriate but 
cohesion achic\ed differently from author 

P3 Transition 

• structure perceived, register appropriate, cohesion perceived but not author's 
v.ord 

Pdsition 4 (P4) Fluent Reading 

• criterion met so that either author's or teacher's 'Aord is provided 

Reprinted by permission of L.J. Chapman, 1983. 

assigning a response to a position, which were established by Chap- 
man (1983c), are summarized in Figure 3. Based on this analysis of 
student responses, the reader's abilities may be characterized on the 
reading development continuum (pictured in Figure 4) developed by 
Chapman (1983c). 

Qualitative analyses of the types of errors make possible an 
assessment of the problems students may have with a particular text, 
i.e., whether their misunderstandings are based on vocabulary /se- 
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Figure 4 

The Reading Development Continuum 
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Reprinted by permission of L.J. Chapman. 1983. 

mantics, syntactical/language structures, or organization. Conse- 
quently, teachers have more information for determining 
appropriate instruction. 

The coding described allows for interpretation of responses to 
individual deletions, each sample, or across samples for individuals 
or class groups. In this manner, cohesion analysis results in a mea- 
sure of readability that provides more diagnostic information than 
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classic readability formulas and can account for intersentential inte- 
gration of information. 



An Example of the Use of the Binkley and 
Chapman Procedure 

The Binkley and Chapman procedure was recently tested on a 
new fourth grade social studies textbook. The following discussion 
of that pilot est will clarify use of the procedure. 

For the purpose of the pilot test, five random samples from 
the textbook were analyzed so that a record of each tie and its dis- 
tance was recorded in the method prescribed by Hailiday and Hasan 
(1976). The counts were then summarized so that distribution of 
ties, in their broadest categories, could be assessed. This summary 
of the distribution of ties appears in Figure 5. 

Using a chi-square test, we determined that the distribution of 
ties was homogeneous across the samples. Sample B represented the 
distribution most closely and was selected as the excerpt to be used 
for the pilot test. 

As described, deletions were made from the sample so that 
seven (27 percent) deletions were reference items, two (7 percent) 
were conjunctions, and sixteen (63 percent) were lexical. Within 
each class of ties, specific deletions depended upon the number 
within each subclass of ties and the number of ties at a given dis- 
tance. For example, in the case of lexical ties, the distribution re- 
flected the percentage of ties using the same item, general term, and 
collocation. The result was a cloze procedure with twenty-five dele- 
tions. 



Figure 5 

The Distribution of Ties across Samples 
Fourth Grade Social Stud. 3 Textbook 
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Reference 


41 


30 


25 


14 


32 


142 
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Substitution 


1 
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Ellipsis 
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2 
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Conjunction 


4 


9 
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10 


35 
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Lexical 


73 


69 


80 


44 


60 


326 


63 


Total 


120 


110 


118 


62 


105 


515 





Er|c'" 124 



Binkley 



Figure 6 

Distribution of Responses along Reading Development Continuum 



P! Prereading !4 percent 

P2 Beginning Reading 17 percent 

P3 Developing Reading 42 percent 

P4 Fluent Reading 28 percent 

Percentages represent the number of responses in each category compared with a total of 
1 ,325 responses. 



The Results 

Fifty-three fourth grade students were tested. Their responses 
were scored as outlined. 

Based on analysis of the distribution of responses (fifty-three 
responses to each of the twenty-five items) along the reading devel- 
opment continuum (as outlined in Figure 6), we conclude that this 
text is well matched with this group of students. Less than one-third 
of the student responses were below the developing reading level. 
Six of the subjects were non-English speaking students who ac- 
counted for a large proportion of omissions and prereading re- 
sponses. 

Examining the frequency of responses, we determined that 
the fifty-three students had little difficulty with reference cohesion 
types in this text. This is evident by the frequency of correct re- 
sponses to deletions requiring pronouns (Figure 7). 

The omission rate is notably low. Where there were high 
numbers of omissions, as in items 4 and 25, we believe the result 
was due to the newness of the form for fourth graders. Both these 
items required students to use also. 

They named the river the James, in honor of their king, 
James I. They-4-named their settlement Jamestown in his 
honor. 

By 1733, Jamestown was only one of many English settle- 
ments in Virginia. The English had -25 -started three new 
colonies south of Virginia -North Carolina, South Caro- 
lina, and Georgia. 
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Figure 7 

Frequency of Student Responses 



Five Most Frequent Answers 



Author's word Rrst Secoruj Third Fourth Fifth Omissions 



1. came 


wanf^d 


decided 


had 


came 


started 


1 




N - 19 


N - e 


N - 5 


N - 3 


N - 3 






P - 3 


P - 3 


P - 3 


P - 4 


P - 3 




2. the 


some 


those 


English 


many/few 


early 


3 




N - 3 


N - 3 


N - 2 


N - 2 


N - 1 






P - 3 


P - 3 


P - 2t 


P -3 


P - 2t 




3. they 


they 


settlers 


Indians 


was 


Jamestown 


5 




N - 35 


N - 2 


N - 2 


N - 2 


N - 1 






P - 4 


P - 2t 


P - 2 


P - 2 


P - 2 




4. also 


had 


also 


all 


finally 


people 


12 




N - 14 


N - 8 


N - 5 


N - 2 


N - 2 






P - 3 


P - 4 


P - 3 


P - 3t 


P - 2t 




5. settlement 


town 


settlement 


king 


country 


city/Iand 


5 




N - 18 


N - 5 


N - 5 


N - 4 


N - 3 






P - 3 


P - 4 


P - 1 


P - 3 


P - 3 




6. colonists 


people 


settlers 


king 


settlement 


group/men 


2 




N - 20 


N - 8 


N - 4 


N - 3 


N - 2 






P - 3 


P - 3t 


P - 2 


P - 2t 


P - 3 




7. they 


they 


most 


many 


James 


nobody 


3 




N - 44 


N - 1 


N - 1 


N - 1 


N - 1 






P - 4 


P - 3 


P - 3 


P - 3 


P - 3 




8. colonists 


people 


settlers 


explorers 


men 


villagers 


2 




N - 26 


N - 12 


N - 4 


N - 4 


N - 1 






P- 3 


P - 3t 


P - 3 


P - 3 


P - 3 




9. England 


England 


Jamestown 


rest 


relax 


sleep 


5 




N - 16 


N - 9 


N 3 


N - 2 


N - 2 






P - 4 


P - 2 


P « 3 


P - 3 


P - 3 




10. staycK) 


lived 


stayed 


worked 


were 


belonged/ 


5 












remained 






N - 22 


N - 15 


N a 2 


N - 3 


N - 2 






P - 3 


P - 4 


P - 3 


P - 2t 


P - 3 




11. arrived 


came 


arrived 


John 


began 


called 


3 




N - 38 


N « 6 


N - 2 


N - 1 


N - 1 






P - 3 


P - 4 


P - 2 


P - 2 


P - 2 




12. his 


his 


the 


who 


was 


Smith 


2 




N - 43 


N - 5 


N - 1 


U - 1 


N - 1 






P - 4 


P - 3t 


P - 2t 


P - It 


P « 1 




13. crop 


way 


crop 


farm 


animal 


place/tobacco 


6 




N - 18 


N - 5 


N - 4 


N - 3 


N - 2 






P - 2 


P - 4 


P - 2 


P - 2 


P - 3 




14. Rotfe 


he 


Rolfo 


they 


John 


John Rolfe 


5 




N - 15 


N - 10 


N -8 


N - 6 


N - 5 






P - 3 


P - 4 


P-2t 


P « 3 


P - 4 




15. Canbbean 


the 


his 


Canbbean 


some 


then/there 


2 




N X 33 


N « 3 


N . 2 


N - 2 


N - 1 






P - 3 


P- 3 


P - 4 


P -3 


P - 2t 
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Figure 7 (continued) 
Frequency of Student Responses 



Rve Most Frequent Answers 



Author s word 


First 


Second 


Third 


Fourth 


Fifth 


O mice 1 A n c 


16. grow 


plant 


make 


raise 


grow 


get 


3 




N - 10 


N ■ 9 


N ■ 7 


N - 5 


N - 4 






P- 3 


P - 3 


P - 3t 


P - 4 


P -3 




17. sell 


seli 


make 


raise 


grow 


use/plant 


3 




N - 21 


N - 7 


N - 4 


N - 3 


N - 2 






P- 4 


P - 2t 


P - 2t 


P - 2t 


P - 3 




18. their 


the 


their 


all 


and 


now/then/ 


8 












later 






N - 14 


N - 13 


N - 3 


N - 3 


N - 1 






P- 3t 


P - 4 


P - 3t 


P - 3 


P - 3 




19. tobacco 


tobacco 


they 


he 


John Rolfe 


It 


7 




N - 20 


N - 16 


N - 3 


N - 2 


N - 2 






P - 4 


P - 2t 


P - 2t 


P - 2t 


P - 3 




20. gold 


anything 


gokj 


tobacco 


money 


much 


10 




N - 7 


N - 5 


N - 3 


N - 3 


N - 3 






P- 2t 


P - 4 


P - 2 


P - 2t 


P -2 




21. river 


great 


high 


deep 


low 


Jamestown 


9 




N- 4 


N - 4 


N - 2 


N - 2 


N - 2 






P- 3 


P - 3 


P - 3 


P - 3 


P -3 




22. new 


new 


the 


early 


people 


years 


7 




N- 8 


N - 6 


N - 4 


N - 3 


N - 3 






P- 4 


P - 3 


P - 3 


P - 2t 


P - 2 




23. coast 


Jamestown 


Canbboan 


east 


settlers 


begmning 


8 




N- 5 


N - 5 


N - 4 


N . 3 


N - 3 






r- 2t 


P - 3 


P - 3t 


P -2t 


P - 2t 




24, Jamestown 


there 


Jamestown 


tobacco 


It 


John Rotfe 


10 




N - 13 


N » 10 


N - 8 


N - 4 


N - 2 






P- 2 


P - 4 


P - 2 


P - 3 


P - 2 




25. also 


already 


now 


only 


also 


settlers/ 


10 












people 






N - 14 


N - 10 


N - 4 


N - 1 


N - 1 






P- 3 


P - 3 


P - 3 


P - 4 


P - 2 





(N) Number of students giving this response 
(P) Position on reading continuum where 

1 ■ Prereading 

2 Beginning Reading 

3 - Developing Reading 

4 - Ruent Reading 

5 - Transition 

(See Figure 4 for further description.) 
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Items 20 and 24 also had high omission rates. These two 
items were comparatively distant from the referent. For example, 
item 20 

The people of Jamestown now understood that the great re- 
source of their new home v/as not-20— but farm land. 

called for gold. as the response. This was directly related to the first 
sentence of the second paragraph, seventeen sentences away, or to 
the notion of gold as a form of cash. Similarly, in item 24 

By 1733, -24- was only one of many English settlements 
in Virginia 

students were required to jump back in the macrostructuie to fill in 
Jamestown although the topic of the preceding paragraph had been 
the land resources. 

Students had the greatest difficulty with register specific 
words, i.e., colonist and settlement. While their responses were 
usually in line with the concept, they did not understand the size 
differential between a colony and a city. 

In summary, results indicated that the book was appropriate 
for the instruction of most of the students. The qualitative analysis 
of responses to particular items suggested possible teaching strate- 
gies (i.e., how the teacher might wish to introduce new vocabulary), 
or raised questions about possible revisions to the textbook before 
publication (i.e., whether also is appropriate for fourth graders). 

Potential Applications of This Methodology 

The methodology has important potential. Teachers could use 
the information from the qualitative analysis as a guideline for les- 
son planning. Too often, teachers who believe students do not un- 
derstand reteach and drill the lesson in the same manner in which it 
was originally presented. As documented, teachers would be able to 
discriminate between types of errors and might develop alternative 
strategies. 
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The methodology would be very helpful to adoption commit- 
tees within a school or district. Through the use of sample field 
tests, it would be possible to rank order the textbooks under consid- 
eration. Because of the well documented limitations of readability 
formulas and the constraints of the adoption process, tWs methodol- 
ogy would serve as a more objective measure of suitability. 

The methodology could be of use to publishers as well. If 
they used the methodology during the development phases of new 
textbooks, they would have a measure of ccmprehensibility beyond 
readability formulas. Based on the qualitative analyses, they might 
consider rewriting parts of books to avoid difficulties students might 
have. Information about specific student needs could be included in 
a teachers' manual. 

This pilot test demonstrates the potential of the methodology. 
Chapman is building a data base of student responses that may elim- 
inate the need for field testing and could drastically reduce the labor 
intensity of the process. 

Conclusions 

There are several significant points to stress. First, classic 
readability formulas serve an important purpose. They are intended 
to and do predict an approximate level of difficulty. Critics would 
like these formulas to account for more of the complexities of text. 
Readability research has shown that the addition of attributes docs 
not increase the reliability of the formulas. . 

Critics claim readability formulas are detrimental to textbook 
production. This is true when the formulas are applied in ways that 
were never intended, but this is not the fault of the formulas. Other 
sources of information about the quality of texts are available. 

Even though it has roots in the works of Aristotle, text re- 
search is a comparatively new phenomenon. Systematic study relat- 
ing text features to learning dates from the late sixties. 
Textlinguistics, which looks at discourse beyond the sentence, dates 
from the late seventies. At this time it is unrealistic to expect an 
elegant formula sto objectively measure text difficulty in the ways 
readability formulas do. 
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New methods of assessing text have proven productive. They 
have provided researchers with information about the reading proc- 
ess in general and its relationship to attributes of text. Guidelines for 
text production are an important step toward improving texts. As 
more researchers address the issue, new ways of assessing text may 
evolve, yielding simple, objective ways of calibrating text. 
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Toward a New Approach to Predicting 
Text Comprehensibility 



Psychological research on what makes a text readable has a rela- 
tively brief history, going back only about fifty years. Concern 
about text comprehensibility, however, can be traced back 2,500 
years to Greek scholars who were attempting to train Athenian law- 
yers in the arts of policy analysis, exposition, and persuasion -top- 
ics that constitute the roots of classical rhetoric. While recognizing 
the important work on readability carried out by rhetoricians, this 
chapter nevertheless has a psychological orientation. 

For reading educators, perhaps the most important new un- 
derstanding about readability has come about because of a shift in 
emphasis in psychological study. Behaviorism has been abandoned 
in favor of a cognitive approach to human information processing. 
Researchers are increasingly aware that whenever the reading level 
of the material changes, the nature of cognitive processing changes 
also. Both decoding ability and text topic familiarity influence read- 
ing comprehension performance. When a selection is estimated to 
be at the third grade readability level, we assume it is easy to com- 
prehend. If, however, the reader is at the beginning lisiges of reading 
and is neither accurate nor automatic at decoding, comprehension 
will be low (Perfetti & Lesgold, 1977, 1979; Samuels, 1977). Simi- 
larly, a skilled reader will experience difficulty comprehending even 
relatively simple text when the topic is completely unfamiliar 
(Kintsch & Miller, 1984; Kintsch & van Dijk, 1978; Kintsch & Vi- 
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pond, 1979). To their detriment, readability formulas in use today 
concentrate only on text characteristics, totally neglecting how cog- 
nitive processing fectors influence the comprehensibility of text. 

We begin this chapter by describing the use of readability for- 
mulas in matching readers with material appropriate to their instruc- 
tional level. Then, in keeping with an interactive view of the reading 
process, we review outside and inside the head factors that influence 
comprehensibility. Finally, we suggest a new way to predict text 
comprehensibility. 

Matching Readers with Appropriate Materials 

An important instructional mandate is to assign written mate- 
rial at levels corresponding to individual reading achievement. In 
elementary school classrooms, however, reading ability may vary 
from three to seven or eight grade levels. Further, as students move 
upward through the grades the range increases; the higher the grade 
level, the greater the spread of classroom reading achievement 
within each class (Balow, 1962; Betts, 1957; Bond & Wagner, 
1966). Yet to teach reading effectively or to help students gain infor- 
mation from text requires a match between the difficulty of the read- 
ing material and the reading ability of the child since students make 
optimal gains when instructed at a level where they can succeed 
(Dunkeld, 1970; Johnson, Kress, & Pikulski, 1986; Scarborough, 
Bruns, & Frazier, 1957). Gray and Leary (1935, p. 5) suggest that 
"to get the right book into the hands of the right reader** is a pressing 
responsibility. 

To solve the problem of matching readers with appropriate 
material, researchers developed prediction formulas for estimating 
the difficulty of books and reading selections. Topically, two phases 
are involved in developing a readability formula with which to judge 
text difficulty (Bormuth, 1971; Dale & lyier, 1934; Gray & Leary, 
1935; Pearson, 1974). In the first phase, the developer selects sam- 
ples of reading materials at successive levels, then constructs ques- 
tions to test readers* ability to comprehend each passage. Next, the 
developer chooses a target population to read the information and to 
complete the corresponding test items. Subsequently, the developer 
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uses the resultant mean scores to establish an index of difficulty for 
the material. Thfs index serves as a criterion measure to be used as a 
dependent variable in the second part of the study. 

In the second phase, the developer quantifies factors believed 
to be predictive of reading difficulty. These factors include such se- 
mantic elements as the specialized technical vocabulary in a pas- 
sage; the easy words; and the hard, nontechnical words. Syntactic 
elements considered are the type and length of sentences and the 
number of clauses and prepositional phrases. The developer estab- 
lishes correlations with the previously identified index of difficulty, 
then determines which elements relate most highly to the criterion 
measure. Factors failing to improve the predictive power of the 
equation are dropped, and a final multiple regression equation is 
developed. The resulting formula is used to estimate the reading uif- 
ficulty of a wide variety of printed information. 

In general, readability formulas are based on two factors: 
word difficulty (as measured by familiarity, frequency, or length) 
and sentence complexity. These two variables represent the highest 
loadings on the regression equations used to predict text difficulty. 
In most cases, the result is that tliese two outside the head text varia- 
bles alone are used almost exclusively to judge reading ease Critics 
, of readability formulas most often cite the formulas' reliability, crite- 
rion validity, and disregard for higher level text organization. Using 
formulas as prescriptions for writing also has been censured. (See 
earlier chapters in this volume.) Elaborations of these issues follow. 

Outside the Head Text Factors 

Limitations of Existing Readability Formulas 

Interf omnia reliability. A critical test of any formula is to 
compare readability estimates on thcj same passage, applying the 
formula under consideration and other formulas of established cred- 
ibility. Studies indicate that readability levels differ depending on 
which formula is used. That is, the readability level of a prose selec- 
tion might be rated as most difficult by one readability formula and 
least difficult by another formula. McConnell (1982), for example. 
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found discrepant results for nine college level introductory eco- 
nomics texts. One text had a grade level equivalent of 11.1 when 
readability was calculated with the Dale-Chall formula, 8.2 when 
the Modified Dale-Chall formula was applied, but 10.7 when read- 
ability was estimated by the Fry formula. Even more striking is the 
fact that rank order of difficulty for the set of books changed accord- 
ing to the formula employed. For example, the difficulty level of 
one economics text was rat^d by one formula as the easiest of the 
nine texts surveyed and by another formula as the next to the hard- 
est. 

Criterion validity. A problem with criterion validity for the 
Lorge, Flesch, and Dale-Chall formulas can be traced back to their 
origin. For convenience (and perhaps because they believed the pas- 
sages to be standardized) these investigators administered the previ- 
ously graded McCall-Crabbs Standard Test Lessons as the reading 
selections on which to base the criterion index of difficulty. These 
researchers correlated text variables, such as word length and sen- 
tence complexity, with McCall-Crabbs passage grade equivalent 
scores, choosing a grade scor? that designated a comprehension per- 
formance of either 50 or 75 percent. The McCall-Crabbs passages, 
however, are inadequately normed and were never intended to be 
employed as criteria for readability formulas. As Stevens (1980a) 
explains, only pupils from New York City pu jlic schools were used 
in standardizing the selections. The grade s( ores for each McCall- 
Crabbs passage were obtained Ij smoothing the curve on a graph 
connecting student performance on the Thorndike-McCall Reading 
Scales with their test lesson performance. Since the grade score 
equivalents were designed to serve as approximations, records of 
their derivation were never kept. Such casual and insufficient norm- 
ing suggests McCall-Crabbs grade scores are unreliable and invalid 
criteria for developing readability measures. 

Disregard for higher level text organization. A basic limita- 
tion of readability formulas is that they ignore such critical text fac- 
tors as cohesiveness and macrolevel organization. Thus, it is 
possible to randomize every sentence in a text without changing (*>e 
tabulated readability. The .signed readability level of an eight word 
sentence in jumbled order would not differ from the assigned read- 
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ability level of an eight word sentence in normal order (Marshall, 
1979). The well organized and the poorly organized text would have 
the same designated difficulty level. 

Prescription for writing. Another drawback of readability 
formulas, resulting in a disservice to students, is the recommenda- 
tion by formula authors that readability indexes be used as guides 
for composing more comprehensible text (Dale & Chall, 1948; Dale 
& Tyler, 1934; Flesch, 1948; Gray & Leary, 1935; Lorge, 1944). 
Gray and Leary suggested shortening the average sentence length 
and decreasing the number of prepositional phrases to increase read- 
ing ease. Flesch proposed that formula study be part of the curricu- 
lum in composition, creative writing, journalism, and advertising 
courses. In feet, rewriting text to conform to a prescribed reading 
level may result in text that is more difficult to read. Shorter senten- 
ces may not be the answer. 

Coleman (1971) pointed out that if the number of ideas is 
held constant, understanding is enhanced when text is elaborated on 
or paraphrased. Pearson (1974) contended that text is easier to com- 
prehend when ideas are stretched over several clauses instead of 
packaged into a single clause. Grammatical complexity may add to 
text comprehensibility. 

The following illustration, in which the first sentence is eas- 
ier to understand than its two sentence counterpart, reinforces this 
point. 

1 . People thought dew fell from the sky because it can be seen 
only in the morning. 

2. People thought dew fell from the sky. It can be seen only in 
the morning. 

In the first sentence, the causal relationship remains intact. 
3y stating the ideas in two sentences, the author is trying to decrease 
reading difficulty by simplifying grammatical complexity (omitting 
the word because) and by reducing sentence length. Thus the reader 
must make an inference between the two statements. Tailoring text 
to conform to the constraints of readability formulas may detract 
from, rather than enhance, text comprehensibility. 

As shown in the next example, "argument repetition" 
(Kintsch & van Dijk, 1978; Kintsch & Vipond, 1979), or the reiter- 
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ation of key words and concepts, also enhances the logical flow of 
meaning from one sentence to the next. 

John likes walking to the new shopping mall. The shopping 

mall has many quaint shops. 
Kintsch and his colleagues (1975) found that for university students 
the recall of words and concepts increased as a function of the num- 
ber of times the argument or word con pt was repeated in the text. 
Similarly, Manelis and Yeko^h (19'' y demonstrated that despite 
longer and more complicated sentences, repetition of the same con- 
cepts across sentences facilitated recall. When arguments are re- 
peated, integrating the ideas in the ter.t and making the relationship 
between the author's ideas explicit, the lext becomes easier to proc- 
ess and hence more readable. The directive to simplify and increase 
reading ease through shortening is thus, paradoxically, ill advised 
because of the extra processing burden imposed. 

A more global problem exists in regard to rewriting stories to 
make them more readable. Many folktales, fables, and myths that 
appear as basal reader selections have been intentionally altered to 
keep sentences short and to employ high frequency words. Nonethe- 
less, each word change or deletion can result in distortion of seman- 
tic content and syntactic flow. If causal relationships are omitted, 
misunderstanding, not better comprehension, may be the outcome. 
Both common sense and literary taste should mitigate against the 
practice of tampering with sentence length and vocabulary load. As 
observed by Finn (1975), colorfiil low frequency words may carry 
nuances of meaning lacking in femiliar words. Rare words also may 
be repeated within a text, thereby increasing reading ease. In the 
case of informational text, mastering technical words may be essen- 
tial to understanding the substance of the material (Nelson-Herber, 
1985). 

Adjunct Comprehension Aids 

Regardless of criticism concerning the limitations of read- 
ability formulas, the foniiulas are products of years of cumulative 
research originating in the twenties (Dale & Chall, 1948; Dale & 
lyier, 1934; Flesch, 1948; Gray & Leary, 1935; Lively & Pressey, 
1923; Loige, 1939; Vogel & Washburne, 1928). Among the varia- 
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bles studied, word difficulty and sentence length have been consist- 
ently identified as factors that differentiate text difficulty levels. 
Making a judgment about the readability of respective texts on the 
basis of these two factors may be a relatively efficient approach to 
estimating comprehensibility. 

Further, with a group of less competent readers and a man- 
dated text, educators may facilitate comprehension by highlighting * 
important information in the text. Such modifications are called ad- 
junct comprehension aids. 

Among adjunct aids, instructional objectives and questions 
placed within the text itself are relatively simple techniques that 
teachers can easily adopt to enhance text comprehensibility. 

Interspersed questions. Empirical evidence supports the gen- 
eral claim that the inclusion of questions in text facilitates learning 
(Anderson, 1980; Anderson & Biddle, 1975; Faw & Waller, 1976). 
Research has shown that questions placed after the text enhance 
learning, both when followup questions are identical to the criterion 
questions and when they are new items. On the other hand, adjunct 
questions appearing before the text are effective only when the fol- 
lowup questions are similar (Frase, 1967; Rothkopf, 1966). Ques- 
tions interspersed throughout the text that appear in close proximity 
to the information on which they are based also are facilitative 
(Rothkopf & Bisbicos, 1967). In addition, higher order questions 
that require students to answer beyond the level of literal response 
are equally appropriate, both when followup questions are identical 
to the original questions and when questions are new (Andre, 1979). 
Depending on both the type and sequencing of questions, their 
placement in expository text improves learning and makes text more 
readable for college students. 

Instructional objectives. Instructional objectives are state- 
ments or study goals presented at the beginning of a text that suggest 
what the reader should know after reading. Research findings are 
inconclusive on the effectiveness of placing instructional objectives 
within the textual format as an aid to learning (Anderson, 1980; Du- 
chastel & Merrill, 1973; Faw & Waller, 1976; Jenkins & Deno, 
1971). For example, specific objectives facilitate intentional learn- 
ing for high school and college age students, but have a deleterious 
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effect on incidental learning if passages are lengthy (Kaplan, 1974; 
Rothkopf & Kaplan, 1972). Anderson's overall conclusion seems 
justified; learning is found to be greater when instructional objec- 
tives are explicitly stated than when objectives are not provided. A 
key point is noteworthy here. Objectives tend to enhance learning 
only when they direct students to focus on information they would 
not otherwise perceive as important (Duell, 1974). Too many objec- 
tives may be overwhelming, and students may disregard them alto- 
gether (Duchastel & Merrill, 1973). 

Theoretically, the use of objectives in informational text 
should help students identify information that is important to re- 
member. Rothkopf (1966) advocates the use of such instructional 
objectives because they do not detract from learning and, in particu- 
lar cases, may increase learning from text. While this conclusion is 
logical, further study is required to see whether such research may 
be generalized and instructional objectives used to increase the com- 
prehension performance of elementary school pupils. 

Readability is not an inherent property of text, but the result 
of an interaction between a set of particular text characteristics and 
the information processing characteristics of individual readers 
(Kintsch & Vipond, 1979). Text factors alone cannot determine 
readability; readers' prior knowledge and understanding influence 
comprehensibility and recall. 

Inside the Head Cognitive Factors 

Background Knowledge 

World knowledge plays an important role in reading compre- 
hension (Adams & Collins, 1979; Armbruster & Anderson, 1981; 
Ausubel, 1960; Rumelhart & Ortony, 1977). For example, Kintsch 
and his colleagues (1975) reported a large difference in reading 
times for college students when paragraphs were easy and dealt with 
well known topics from classical history. When paragraphs were 
more demanding, and focused on scientific topics about which sub- 
jects possessed little or no previous knowledge, students took longer 
to read the text and recalled less information. Chiesi, Spilich, and 
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Voss (1979) also found that it was easier for college age students to 
leam more about a particular topic when they possessed high prior 
knowledge of that topic. 

Research findings are similar for subjects at other age and 
grade levels. Pfearson, Hansen, and Gordon (1979), working with 
second grade children, and Stevens (1980b), working with ninth 
graders, showed that subjects highly familiar with the topic not only 
had better comprehension, but recalled more than subjects not 
highly familiar with the topic. Taylor P 79) found that poor fifth 
grade readers' comprehension was diu-.nished when their prior 
knowledge wa.s low, but the poor readers were able to comprehend 
adequately when topics were familiar and written at appropriate dif- 
ficulty levels. Dooling and Lachman (1971), Bransford and Johnson 
(1972), and Bransford and McCarrell (1974) demonstrated that until 
background knowledge is brought to a text, the text may seem in- 
comprehensible. Well written texts signal i * the reader what back- 
ground knowledge must be activated to enhance processing. 

Insight into the role that prior knowledge plays in facilitating 
text processing has been obtained from such studies as that of 
Blachowicz (1977), who found that readers in second, fifth, and 
seventh grades and college level adults frequently remembered more 
than what was contained in the sentences read as evidenced by the 
false identification of inference statements as statements they had 
encountered in the passages. Blachowicz hypothesized that during 
reading and recall readers use their knowledge of tlie world to sup- 
plement information in the text. Her research substantiates the clas- 
sical work of Bartlett (1932), who found numerous distortions in the 
story recalls of English subjects who had read a tale of the Indians of 
the Northwest coast. The distortions made the story comply with the 
past experiences of the readers, suggesting that when misunder- 
standings or lapses in memory occur, readers reconstruct meaning 
based on their previous knowledge and experience. 

The theoretical explanation for this view of reading as a con- 
structive process is that as experiences and attitudes are assimilated 
they form cognitive structures. These cognitive structures, called 
schemata (Anderson, 1977; Rumelhart, 1980; Spiro, 1977), serve 
as a framework for storing information and for interpreting infor- 
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mation implicit in the text. Thus, in Bartlett's study, the subjects' 
memory for a story from an unfamiliar culture was distorted be- 
cause recall was based on and conformed with subjects' prior knowl- 
edge. When readers cannot exactly recall aspects of a story, they 
rely on previously formed schemata to reconstruct what might have 
occurred. According to Kintsch and van Dijk (1978), familiarity 
with the facts allows readers to make inferences and to fill in miss- 
ing ideas. Readers recall not only what is stated, but what seems to 
follow. 

Bransford and Franks (1971) demonstrated that college un- 
dergraduates acquired complete ideas from exposure to partial ideas 
and that subjects genuinely believed they had originally been pre- 
sented with the entire idea, when in fact they had not. Research by 
Brown and colleagues (1977) confirms these findings. Subjects in 
their study later had difficulty distinguishing between their own 
story embellishments and the actual prose content. Therefore, sub- 
stantial empirical data support the presence of schemata that provide 
a basis for comprehending, interpreting, and remembering dis- 
course. 

As the literature review has demonstrated, text comprehensi- 
bility cannot be considered a property of text alone, but one of text- 
reader interaction. Accordingly, to estimate text comprehensibility, 
you must have some estimate of the influence of textual factors, such 
as conventional text difficulty level and adjunct comprehension aids, 
and an estimate of cognitive factors, such as the reader's prior 
knowledge of the text topic. But word recognition skill, a secund 
reader factor, also has a profound effect on reading comprehension 
performance. 

Word Recognition Skill 

A central claim by Perfetti (1977), Lesgold and Perfetti 
(1978), Perfetti and Lesgold (1977, 1979), Stanovich (1980), and 
Perfetti and Roth (1981) is that difficulty in word identification not 
only subtly retards reading comprehension, but also severely dis- 
rupts comprehension of text by interrupting the reader's ongoing 
train of thought. As explained by Samuels (1983), when a person is 
'utomatic at word recognition, little attention to decoding is re- 
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quired and the available attention can be used for comprehension. If 
too much attention is required to decipher words, there will not be 
enough processing capacity left for comprehension. Thus, accuracy 
and automaticity of word recognition facilitate reading comprehen- 
sion. Having to expend effort at decoding in a word by word manner 
leaves the reader with little capacity for higher order reading and 
comprehension. 

The punitive consequences that slow decoders' experience are 
explained in terms of the limited capacity of the short term memory 
that can retain only four to seven items at one time (Kintsch & Vi- 
pond, 1979). Because poor decoders require considerable process- 
ing space to unlock single words, the theory is that the same 
processing space cannot be used to store previously coded words or 
phrases. Consequently, antecedent words are lost from memory. In 
addition, the reader's capacity to call up existing schemata to predict 
upcoming information will likely be reduced. As a result, compre- 
hension suffers. Once decoding skills function appropriately, the as- 
sumption is that processing capacity will be free and comprehension 
performance will improve. Readers f . will be able to focus atten- 
tion on the task of text comprehension, processing di deep structure 
levels as opposed to surface structure levels. 

While observers would argue that some poor comprehenders 
have adequate word recognition skill, studies by Calfee and Drum 
(1978), Golinkoff and Rosinski (1976), and Perfetti and Hogaboam 
(1975) report that the apparent sufficiency in word recognition abil- 
ity is nullified when speed is introduced as a dependent variable. 
Using third and fifth grade pupils as subjects, Perfetti and Hoga- 
boam demonstrated that good comprehenders decode single words 
faster than do poor comprehenders. For common, high frequency 
words used in their study, differences in decoding times were si* jht 
but still significant. For nonvords conforming to English spelling 
patterns, however, decoding time differences were more pro- 
nounced. This suggests that less skilled readers expend too much 
effort on word identification and to enhance comprehension per- 
formance readers need to be more than accurate at word recogni- 
tion. LaBerge and Samuels (1974) have labeled ♦his additional 
dimension associated with word recognition as automaticity. 
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Both accuracy and automaticity of word recognition are im- 
portant to reading comprehension. As a result, word recognition 
skill is a second inside the head cognitive factor likely to have a 
significant effect on text comprehensibility. 

A New Approach to Predicting Readability 

A study by Zakaluk (1985) found support for the theoretical 
constructs proposed in the foregoing review of the literature. She 
studied fifth grade i jdents from twelve classrooms in urban, subur- 
ban, and rural schools. The students read passages ranging from 
350 to 435 words taken from social studies and science-health texts. 
The difficulty of the passages langed from grades four through 
seven (Fry, 1968). After reading, students answered open ended re- 
call questions under various adjunct aid conditions. Results indi- 
cated that across four trials inside the head factors (word recognition 
automaticity, prior knowledge) combined with outside the head fac- 
tors (passage difficulty as estimated by conventional readability 
measures and the use of adjunct comprehension aids) accounted for 
from 40 to 28 percent of the variance. Thus empirical data demon- 
sti -ite that both inside and outside factors influence how well a given 
rea .er will comprehend a particular text. 

Despite the importance of these factors, formulas currently 
used for estimating text readability fail to take a number of them into 
account. We therefore propose a new procedure to predict text diffi- 
culty using information from both inside and outside the head 
sources. We should be able to make better predictions about the 
ability of individual readers to comprehend particular texts by con- 
sidering all elements: traditionally measured text factors (word dif- 
ficulty and sentence length), at^unct aids (an outside the head 
factor), and the reader's prior knowledge and reading skills. 

In order to simplify this process, we use a nomograph. A 
nomograph is a table that uses information from two sources to pro- 
vide information about a third area of hiterest. A common applica- 
tion for a nomograph is to obtain an estimate of the percent of body 
fat. To do this, we measure skinfold thickness from two different 
parts of the body, such as the back of the upper arm and the rib cage. 
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We plot these tNVO readings on two scaled vertical lines and then con- 
nect the points by using a ruler. Between the two outside lines is a 
third vertical line that serves as the predictor variable. Where the 
ruler crosses this predictor variable is the percent of body fat. We 
propose to estimate reading comprehension performance by the 
same process. 

Using the Nomograph 

Figure 1 shows the nomograph with three vertical lines. On 
the left a scale indicates outside the head factors that influence com- 
prehcnsibility. These include text readability level and adjunct com- 
prehension aids. Text readabilit> ranges from grade one through 
college level. On the vertical line to the right, we find inside the 
head factors that influence comprehension. These include knowl- 
edge of text topic and word recognition skill. The center line indi- 
cates the extent to which an individual can comprehend the text in 
question. 
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Inside the Head Factors 

Measuring word recognition. There are three levels of word 
recognition skill. The lowest is the nonaccurate level, at which stu- 
dents experience difficulty in word recognition. The second is 
called accurate but not automatic. At this level, students devote most 
of their attention to decoding, leaving little capacity to consolidate 
overall meaning. At the third level, students are both accurate and 
automatic at word recognition. When students are automatic, word 
recognition requires minimal effort, thus allowing readers to focus 
their attention on obtaining meaning. 

The simplest way to determine students* level of word recog- 
nition is to have them read orally from a 150 word passage that is at 
their grade placement level in terms of readability. Instruct students 
to read orally and to be able to tell what they remember when they 
finish. If students* word recognition accuracy is less than 95 per- 
cent, they are labeled nonaccurate. If students score above 95 per- 
cent in word recognition accuracy but experience difficulty retelling 
what they read, they fall into the accurate but not automatic cate- 
gory. If students achieve over 95 percent in terms of word recogni- 
tion accuracy and can retell the gist of the passage satisfactorily, 

Figure 2 

Worksheet fcr Prior Knowledge Responses 



Fossii Fuels 

Fossil Fuels 

Fossil Fuels « , 

For U Fuels 

Fossii Fuels ^ 

Fossil Fi'els 

Fossil Fu/ls — 

Fossil Fuels _^ 

The worksheet should coniain about 25 lines with :he stimulus Wv rds wriitcn the begin- 
ning of each line. 
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they are both accurate and automatic. Nearly accurate orai reading 
and the ability to capture vhe gist of the selection indicate automatic- 
ity because we assume that simultaneous decoding and comprehen- 
sion take place only when decoding is automatic. 

Knowledge of text topic. We can use a word association tech- 
nique to measure topic familiarity (Zakaluk, Samuels, & Taylor, 
1986). A key word or phrase that embodies the main idea of the 
topic is chosen to serve as a stimulus. Students are required to write 
as many words or ideas as they can think of in association with the 
key word. Having the students use lined paper on which the stimu- 
lus word is printed at the beginning of each line (Figure 2) ensures 
that they continue to use the original word or phrase, not newly pro- 
Figure 3 

Instructions for Administe ring the Word Association Task 

introduction 

This is a test to see how many words you can think of and write down in a short 

time. , - 

You will be given a key word and you are to write as many other words as you 
can that the key word brings to mind. » 

The words you write may tc things, places, ideas, events-whatever you think 
of when you see the key v/ord. 
Modeling and Chalkboard Demonstration 

For example, think of the word king. (Write king on the chalkboard.) Some of 
the words or phrases that king brings to mind are queen/prince/paliice/Charles/ 
London/kingdom/England/ruler/kingfish/Sky King/of the road. Continue to 
brainstorm for other words. Add these to the chalkboard list You may use two 
words, phrases, long words, or short vx>Tds. Any idea is acceptable, no matter 
how many words. 

Practice with Discussion 

Work on practice sheets. Kitchen and transportation arc two highly familiar 
topics. Following completion of the activity, clarify the task by sharing ideas and 
discussing any questions. 
Reminders 

The followmg reminders are given during practice and during the actual tasK. 

1. No one is expected to fill in all the spaces on the page, but write as many 
words as you can think of in association with the key word. 

2. Be sure to remember the key word while writing because the test is to see 
how many other words the key word calls to mind. 

3. A good way to do this is to repeat the key word over and over to yourself as 
you write. 



g j^Qjdicting Text Comptehensibllity 1^7 



duced words, to generate ideas. Figure 3 gives sample directions for 
this task- Give the students three minutes to generate words and 
ideas. 

Responses are scored with one point being awarded for each 
reasonable idea unit up to a maximum of ten points. No credit is 
granted for unreasonable associations, for example, the word sand- 
wich in conjunction with the stimulus word paper. When generated 
words or phrases consist of a list that can be subsumed under a su- 
perordinate category, one point is given for the superordinate cate- 
gory and one point for all of the subordinate ideas. For example, if 
the key word \s fanning and a student lists the names of a series of 
crops such as wheat, barley, com, rye, and oats, one point is given 
for the superordinate word crops and one point for the itemized 
products. In this case it is assumed that students have begun to use 
the generated words rather than the stimulus word as cues for pro- 
ducing 

Outside the Head Factors 

Readability level. To establish the difficulty level of the text, a 
readability formula such as Fry's (1968) may be used. If the text is 
one used in the classroom, such as a basal reader or a social studies 
text, the readability level is often the same as the grade level for 
which the *:xt is designed. Be cautious in applying this rule, how- 
ever, because different selections within the text may vary. 

Adjunct comprehension aids. In addition to text readability 
level, another outside the head factor that influences comprehension 
is adjunct aids, such as statements of objectives or study questions 
located within the text or at the beginning or the end. These adjunct 
comprehension aids highlight important information and increase 
the depth of text processing. Their presence in the materi'^1 adds to 
text comprehensibility. 

We have described how to determine inside and outside the 
head factors. Now we indicate how to use these with the nomograph 
shown in Figure 1. 
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Plotting the Nomograph 

To establish the outside the head factors, enter the scale at the 
text readability level. To determine a composite outside the head fac- 
tor score, subtract a half point if a statement of objectives is present 
and another half point if study questions are present. 

To determine the inside the head factors, for word recogni- 
tion skill add zero if the student is nonaccurate (belov^ 95 percent 
when substitutions, mispronunciations, word omissions, additions, 
and repetitions that involve two or more words are tallied from the 
oral reading); one point if the student is accurate but not automatic; 
and two points if the student is both accurate and automatic, as out- 
lined in the following scale. 

Points Word Recognition Level 

0 Nonaccurate 

1 Accurate but not automatic 

2 Accurate and automatic 

For knowledge of text topic, add the score obtained from the 
word association task (maximum of ten points) to the word recogni- 
tion score, thus establishing an overall score for plotting on the in- 
side the head factor scale. Connect the plotted scores on the two 
outside scales with a ruler and read the predicted level of compre- 
hension performance on the center scale. This will be either high, 
average, or low. 

Figure 4 illustrates the application of the nomograph for text 
written at college levels. For outside the head factors, the text read- 
ability level is college and there are no adjunct aids. For inside the 
head factors, the student is automatic in terms of word recognition 
(two points) and generated five ideas on the word association task 
(five points^ The student's inside the head score is thus 7. When the 
inside and outside the head figures are connected, the predicted 
level of comprehension as indicated on the center line is just below 
average. 




Figure 4 

Application of the Nomograph for College Level Text 



Outside the Head 
Factors 

Text Readability Level 5 

Adjunct Comprehension g 
Aids 



ri2 



-II 



10 




.c, Inside the Head 
Factors 

-8 Word Recognition 
/ Skill 

A Nonaccuratc 
-6 B Accurate but 
not automatic 
C Accurate and 
automatic 

Knowledge of Text 
a Topic 



Figure 5 

Application of the Nomograph for Primary Level Text 
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Figure 5 illustrates the application of the nomograph at pri- 
mary levels. The outside the head factors show that the text readabil- 
ity level is grade two and there are no adjunct comprehension aids. 
The inside the head factors indicate that the student is accurate but 
not automatic at word recognition (one point) and obtained a score 
of 7 on the word association task (seven points). The inside the head 
factor score is therefore 8. The predicted level of comprehension for 
that student consequently is high average. 

Our final example is of a student who is reading a tenth grade 
text that contains a statement of objectives as well as questions inter- 
spersed throughout the text. We enter the outside the head scale at 
grade ten and subtract one-half point for each of the adjunct compre- 
hension aids (subtract because adjunct aids decrease the difficulty 
level of the text). The result gives us an outside the head composite 
score of 9. The student is automatic at word recognition (two points) 
but has no prior knowledge (zero points). The combined inside the 
head factor is therefore two, and the predicted comprehension per- 
formance is low, as indicated in Pigure 6. 



Figure 6 

Application of the Nomograph for a Secondary Level Text 
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The nomograph was validated by comparing the predicted 
comprehension performance of hypothetical readers as estii"ated by 
this simple to use nomograp^ with a more detailed nomograph that 
was developed on the actual performance of the 253 fifth grade 
readers who were subjects in Zakaluk's study (!985) cited earlier. 
When the actual predictions using the complex nomograph were 
compared with the nomograph predictions derived from the nomo- 
graph presented in this chapter, there was a match on 30 of 36 com- 
parisons (r = .93). In other words, when hypothetical cases were 
generated to test the degree of overlap between comprehension pre- 
dictions made with the two nomographs, the validity of the simple to 
use nomograph was upheld. 

Conclusions 

Students make optimal learning gains when instructional text 
matches their reading achievement level. Current measures to esti- 
mate text difficulty are inadequate because they consider only one 
source of information -that contained on the printed page. In addi- 
tion to the outside the head factors of text readability levels and the 
use of adjunct comprehension aids, inside the head factors also must 
be considered in predicting the difficulty level of a particular text for 
a particular reader. These inside the head factors include knowledge 
of tert topic and degree of reading fluency. 

This nomograph has taken the prediction of text readability 
one step further, bringing in powerful variables that influence text 
comprehension. Thus, the teacher is able to make better estimates 
about which students will comprehend with ease and which will re- 
quire extra attention and be able to adjust instructional approaches 
accordingly, spending more time on developing word familiarity or 
building and activating prior knowledge. The nomograph draws 
readability out of the stage of behaviorism where only outside the 
head fectors were examined, bringing the concept into the realm of 
cognitive psychology where both inside and outside the head factors 
are examined to predict reading performance. 
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